New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible regression in node.replace(X) when X is a huge string of markup ("error parsing fragment (1)") #1442
Comments
Thanks for reporting this, I'll take a look and try to reproduce. |
Can you please provide the output from |
More precisely: can you please provide |
I've reproduced this issue, seeing an error with 1.6.7.2 and not seeing an error with 1.6.6.4. Both versions of nokogiri were compiled against the vendored libxml2, and were running on Ruby 2.3.0. |
OK, this behavior was introduced in commit 9eb540e which applied upstream libxml2 patches to the vendored libxml 2.9.2. |
And notably this behavior is also present in nokogiri 1.6.8.rc3 which uses libxml 2.9.3. This indicates to me that this is entirely due to upstream behavior introduced in libxml2. |
Right, OK, so if you rescue the exception and examine the document's errors, you'll see:
which simply means that the depth is deeper than libxml2 allows. Notably, you're not setting the HUGE config on the document -- That said, the HUGE option doesn't seem to do anything; a doc with this option set or not set still raises this error at SIZE_THRESHOLD 7484. In any case, if you feel strongly that this is a bug, I'd ask that you submit a bug report upstream to the libxml2 project. Since there's nothing Nokogiri can do to change the behavior of libxml2, I'm going to close this issue. If you'd like to continue the conversation, I'm happy to keep talking, just respond here. Sorry this isn't a more helpful response. |
Please note I've create #1443 to investigate the possibility that we're not passing parse options into the fragment parser; this is related to my note above about the HUGE option having no apparent effect. |
Hi @flavorjones : quick correction. I ran into this originally on my Mac laptop but then moved over to a Linux box to eventually come up with the script I originally attached. Here's the info for both versions on the Linux box: 1.6.7.2
1.6.6.4
|
Thanks for taking a look at this issue. In terms of whether it's a bug or not: I should first bring up that I inadvertently muddled what I now see as two distinct issues in the script I originally attached.
Despite specifying the
That being said, I don't know that it would be trivial to carry over that config when dealing with arbitrary markup strings as a source to a
While this does not:
Which makes me feel like my usage of the |
Hi @flavorjones -- just wanted to see if you saw my previous comment? |
Given
Nokogiri::HTML
documentA
(a large HTML document with many nested tags) andNokogiri::HTML
documentB
(which can be a simple document:<html><body></body></html>
), when trying to replace a node inA
with the string contents ofB
, this error is thrown:This occurs with Nokogiri gem version
1.6.7.2
. It does not seem to occur with Nokogiri1.6.6.4
. I am on Mac OS X 10.11.3. I think NG was built using the system libxml2 library:/usr/lib/libxml2.2.dylib
.I have attached a small script (nokogiri_large_doc_replace_test.zip) to reproduce the problem in version
1.6.7.2
. The script will create the large documentA
with many nested elements until it reaches an arbitrary size, then create the documentB
and attempt to dodocumentB.replace(large_markup_inner_html_string_from_A)
to see if the exception is thrown.The arbitrary size threshold that throws an error on my system is noted in the constant at the beginning of the script:
SIZE_THRESHOLD = 16505
. If I set this to something low (say10
), I do not get the exception.I am afraid I have not yet pinpointed the change or cause that results in this error in the newer version. However, maybe these bits I have found while digging into it will help (and not misdirect!):
node.replace
with a markup string argument,#coerce
is called, which calls#fragment
, which creates a newDocumentFragment
.DocumentFragment
's initializer checks for a context (which exists when coming fromnode.fragment(string)
), and then calls#parse
on it.#parse
method builds a newoptions
object, with some defaults. Alas, thehuge
option one may have set in the original document does not seem available here to be propagated to this object. The subsequentin_context
call fails, presumably because the underlying parser reaches the arbitrary limit and breaks. If I modify thisoptions
object and add thehuge
option, the fragment is created successfully.I should emphasize that even though I found a semblance of cause in step 4, I do not yet know what changed between
1.6.6.4
and1.6.7.2
--maybe it was something else!Update 1: I just realized that in the test script, line 46, I am doing a control experiment, where I read the large markup string into a new
Nokogiri::HTML
, and, in that call, I am not using thehuge
option, yet it still works. So perhaps my usage ofhuge
is masking an underlying problem.The text was updated successfully, but these errors were encountered: