From e0da4d229f36c8a6035bdc675d429e9df0918aa6 Mon Sep 17 00:00:00 2001 From: Mike Dalessio Date: Sun, 31 Jan 2021 10:07:14 -0500 Subject: [PATCH] fix(cruby): patch libxml2 to address GNOME/libxml2#200 This patch shrinks the libxml2 input buffer in a few parser functions. Fixes #2132 --- CHANGELOG.md | 1 + ...nk-the-input-buffer-when-appropriate.patch | 70 +++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 patches/libxml2/0010-parser.c-shrink-the-input-buffer-when-appropriate.patch diff --git a/CHANGELOG.md b/CHANGELOG.md index 02f4c831cf..e2c1da982f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA ### Fixed * [CRuby] `NodeSet` may now safely contain `Node` objects from multiple documents. Previously the GC lifecycle of the parent `Document` objects could lead to contained nodes being GCed while still in scope. [[#1952](https://github.com/sparklemotion/nokogiri/issues/1952)] +* [CRuby] Patch libxml2 to avoid "huge input lookup" errors on large CDATA elements. (See upstream [GNOME/libxml2#200](https://gitlab.gnome.org/GNOME/libxml2/-/issues/200) and [GNOME/libxml2!100](https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/100).) [[#2132](https://github.com/sparklemotion/nokogiri/issues/2132)]. * [CRuby] `{XML,HTML}::Document.parse` now invokes `#initialize` exactly once. Previously `#initialize` was invoked twice on each object. * [JRuby] `{XML,HTML}::Document.parse` now invokes `#initialize` exactly once. Previously `#initialize` was not called, which was a problem for subclassing such as done by `Loofah`. diff --git a/patches/libxml2/0010-parser.c-shrink-the-input-buffer-when-appropriate.patch b/patches/libxml2/0010-parser.c-shrink-the-input-buffer-when-appropriate.patch new file mode 100644 index 0000000000..d3d9ad46a4 --- /dev/null +++ b/patches/libxml2/0010-parser.c-shrink-the-input-buffer-when-appropriate.patch @@ -0,0 +1,70 @@ +From ca565c1edef9a455453fa8564270cc9c5813e1b9 Mon Sep 17 00:00:00 2001 +From: Mike Dalessio +Date: Sun, 31 Jan 2021 09:53:56 -0500 +Subject: [PATCH] parser.c: shrink the input buffer when appropriate + +Fixes GNOME/libxml2#200 + +Also see discussions at: +- GNOME/libxml2#192 +- https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e +- https://github.com/sparklemotion/nokogiri/issues/2132 +--- + parser.c | 6 ++++++ + 1 file changed, 6 insertions(+) + +diff --git a/parser.c b/parser.c +index a7bdc7f..efde672 100644 +--- a/parser.c ++++ b/parser.c +@@ -4204,6 +4204,7 @@ xmlParseSystemLiteral(xmlParserCtxtPtr ctxt) { + } + count++; + if (count > 50) { ++ SHRINK; + GROW; + count = 0; + if (ctxt->instate == XML_PARSER_EOF) { +@@ -4291,6 +4292,7 @@ xmlParsePubidLiteral(xmlParserCtxtPtr ctxt) { + buf[len++] = cur; + count++; + if (count > 50) { ++ SHRINK; + GROW; + count = 0; + if (ctxt->instate == XML_PARSER_EOF) { +@@ -4571,6 +4573,7 @@ xmlParseCharDataComplex(xmlParserCtxtPtr ctxt, int cdata) { + } + count++; + if (count > 50) { ++ SHRINK; + GROW; + count = 0; + if (ctxt->instate == XML_PARSER_EOF) +@@ -4776,6 +4779,7 @@ xmlParseCommentComplex(xmlParserCtxtPtr ctxt, xmlChar *buf, + + count++; + if (count > 50) { ++ SHRINK; + GROW; + count = 0; + if (ctxt->instate == XML_PARSER_EOF) { +@@ -5186,6 +5190,7 @@ xmlParsePI(xmlParserCtxtPtr ctxt) { + } + count++; + if (count > 50) { ++ SHRINK; + GROW; + if (ctxt->instate == XML_PARSER_EOF) { + xmlFree(buf); +@@ -9783,6 +9788,7 @@ xmlParseCDSect(xmlParserCtxtPtr ctxt) { + sl = l; + count++; + if (count > 50) { ++ SHRINK; + GROW; + if (ctxt->instate == XML_PARSER_EOF) { + xmlFree(buf); +-- +2.25.1 +