Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge input lookup with large CDATA when parsing (SAX) #2132

Closed
CloCkWeRX opened this issue Dec 10, 2020 · 6 comments · Fixed by #2182
Closed

Huge input lookup with large CDATA when parsing (SAX) #2132

CloCkWeRX opened this issue Dec 10, 2020 · 6 comments · Fixed by #2182

Comments

@CloCkWeRX
Copy link

Please describe the bug
Potentially a duplicate of #2028 - however this input didn't have linebreaks.

Help us reproduce what you're seeing
Executable test case:

#! /usr/bin/env ruby
# encoding: utf-8

require 'nokogiri'
require 'stringio'

TEMPLATE = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns="http://example.com">
   <soapenv:Header>
      <AuthHeader xsi:type="ns:vAuthHeader">
      <userName xsi:type="xsd:string">gorilla</userName>
      <password xsi:type="xsd:string">secret</password>
    </AuthHeader>
   </soapenv:Header>
  <soapenv:Body>
    <ns:checkToken soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <checkToken xsi:type="xsd:string"><![CDATA[%s]]></checkToken>
    </ns:checkToken>
   </soapenv:Body>
</soapenv:Envelope>
XML




class Handler < Nokogiri::XML::SAX::Document
  def error(message)
    raise message
  end
end

def try_parse(number)
  huge_data = "a" * (1024 * 1024 * number)
  xml = StringIO.new(TEMPLATE % (huge_data))
  parser = Nokogiri::XML::SAX::Parser.new(Handler.new)
  begin
    parser.parse(xml)
  rescue Exception => e
    puts "#{number} raises #{e}"
    return
  end
  puts "#{number} is OK"
end

try_parse 10 # error
try_parse 9.5 # ok

Expected behavior

No errors in both cases

Environment

$ nokogiri -v
# Nokogiri (1.11.0.rc3)
    ---
    warnings: []
    nokogiri: 1.11.0.rc3
    ruby:
      version: 2.5.3
      platform: x86_64-linux
      gem_platform: x86_64-linux
      description: ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
      engine: ruby
    libxml:
      source: packaged
      patches:
      - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
      - 0002-Remove-script-macro-support.patch
      - 0003-Update-entities-to-remove-handling-of-ssi.patch
      - 0004-libxml2.la-is-in-top_builddir.patch
      - 0005-Fix-infinite-loop-in-xmlStringLenDecodeEntities.patch
      compiled: 2.9.10
      loaded: 2.9.10
    libxslt:
      source: packaged
      patches: []
      compiled: 1.1.34
      loaded: 1.1.34

Additional context

@CloCkWeRX CloCkWeRX added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Dec 10, 2020
@flavorjones
Copy link
Member

@CloCkWeRX Thanks for submitting this report! I'll take a look as soon as I can to determine if it's a dupe of #2028 ... and if so will probably reconsider whether we should vendor the upstream patch to address it.

@flavorjones
Copy link
Member

@CloCkWeRX Unfortunately the libxml2 patches that addressed #2028 do not address this particular problem. I'll noodle a bit to see if I can discover the root cause.

@flavorjones
Copy link
Member

flavorjones commented Dec 11, 2020

OK, found the problem, which is similar to the issue fixed by this (unmerged) upstream commit: https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e1ee77783e43c9059af00cd326deee3372

Let's see what @nwellnhof thinks.

Edit: Note that the following patch, applied to libxml2, will address your issue:

diff --git a/parser.c b/parser.c
index 85494df..6f6ef78 100644
--- a/parser.c
+++ b/parser.c
@@ -9776,6 +9776,7 @@ xmlParseCDSect(xmlParserCtxtPtr ctxt) {
    sl = l;
    count++;
    if (count > 50) {
+       SHRINK;
        GROW;
             if (ctxt->instate == XML_PARSER_EOF) {
        xmlFree(buf);

@flavorjones flavorjones added upstream/libxml2 and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Dec 11, 2020
@flavorjones
Copy link
Member

Please note that I've also mentioned in an upstream issue at https://gitlab.gnome.org/GNOME/libxml2/-/issues/200

@flavorjones
Copy link
Member

Upstream has asked me to submit a PR to fix this in libxml2 -- see Nick's comment at https://gitlab.gnome.org/GNOME/libxml2/-/issues/200. I'll try to get to that in the next few days.

flavorjones added a commit that referenced this issue Jan 31, 2021
This patch shrinks the libxml2 input buffer in a few parser functions.

Fixes #2132
flavorjones added a commit that referenced this issue Jan 31, 2021
This patch shrinks the libxml2 input buffer in a few parser functions.

Fixes #2132
@flavorjones
Copy link
Member

I've submitted a merge request upstream at https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/100

I've also got a PR with the test case @CloCkWeRX provided and the libxml2 patch ready at #2182.

This was referenced Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants