Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] 1.11.4 - New libxml2 recursion limit breaks parsing for complex files #2257

Closed
yedennek opened this issue Jun 2, 2021 · 6 comments
Closed

Comments

@yedennek
Copy link

yedennek commented Jun 2, 2021

In the latest Nokogiri update to 1.11.4, the containing libxml2 library has been updated to set a hard-coded recursion limit. (See commit 6f1470a5 of that project).

On our end, we have to parse government-supplied files for tax specifications which are, admittedly, very large and complicated and we're not allowed to modify them.

Unfortunately the latest update means we can no longer parse these files and hit an XPath error : Recursion limit exceeded error when we call the following to read the linked CT-2014-v1-95.xslt.

Nokogiri::XSLT(File.read(SCHEMA_DIR.join(xsl_filename)))

In our tests this is manifesting in a test failure parsing line 7669 of the CT-2014-v1-95.xslt file:

compilation error: element when xsl:when : could not compile test expression

Expected behavior

Ideally, we'd like to be able to parse the horribly complicated file, but it would be incredibly useful to be able to have an override setting for the maximum recursion limit in order to override the default.

Environment

# Nokogiri (1.11.4)
    ---
    warnings: []
    nokogiri:
      version: 1.11.4
      cppflags:
      - "-I/Users/x/.gem/ruby/2.7.3/gems/nokogiri-1.11.4-x86_64-darwin/ext/nokogiri"
      - "-I/Users/x/.gem/ruby/2.7.3/gems/nokogiri-1.11.4-x86_64-darwin/ext/nokogiri/include"
      - "-I/Users/x/.gem/ruby/2.7.3/gems/nokogiri-1.11.4-x86_64-darwin/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 2.7.3
      platform: x86_64-darwin20
      gem_platform: x86_64-darwin-20
      description: ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-darwin20]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Remove-script-macro-support.patch
      - 0002-Update-entities-to-remove-handling-of-ssi.patch
      - 0003-libxml2.la-is-in-top_builddir.patch
      - 0004-use-glibc-strlen.patch
      - 0005-avoid-isnan-isinf.patch
      - 0006-update-automake-files-for-arm64.patch
      libxml2_path: "/Users/x/.gem/ruby/2.7.3/gems/nokogiri-1.11.4-x86_64-darwin/ext/nokogiri"
      iconv_enabled: true
      compiled: 2.9.12
      loaded: 2.9.12
    libxslt:
      source: packaged
      precompiled: true
      patches:
      - 0001-update-automake-files-for-arm64.patch
      - 0002-Fix-xml2-config-check-in-configure-script.patch
      compiled: 1.1.34
      loaded: 1.1.34
    other_libraries:
      zlib: 1.2.11
      libiconv: '1.15'

Additional context

We think we might be able to fix this temporarily on our end with a patch on a forked version of Nokogiri (which we're still testing) but we'd rather not do this permanently, and were wondering whether it would be possible to get some customisable limit for Nokogiri?

Proposed patch:

diff --git a/xpath.c b/xpath.c
index 7497ba07..bc1fa094 100644
--- a/xpath.c
+++ b/xpath.c
@@ -143,7 +143,7 @@
 #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 #define XPATH_MAX_RECURSION_DEPTH 500
 #else
-#define XPATH_MAX_RECURSION_DEPTH 5000
+#define XPATH_MAX_RECURSION_DEPTH 10000
 #endif
 /*

To note, I’ve also raised this with libxml2 in this issue here.

-->

@yedennek yedennek added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Jun 2, 2021
@flavorjones
Copy link
Member

👋 Hi, @yedennek, thanks for opening this issue. Sorry you're having this problem!

Let me take a look.

@flavorjones
Copy link
Member

I've reproduced what you're seeing, and I've confirmed via git-bisect that this limit was introduced in https://gitlab.gnome.org/GNOME/libxml2/-/commit/6f1470a5d6e3e369fe93f52d5760ba7c947f0cd1

@flavorjones flavorjones added upstream/libxml2 and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Jun 2, 2021
@flavorjones
Copy link
Member

Nick, one of the libxml2 maintainers, has already made a fix upstream in how stack depth was being calculated!

https://gitlab.gnome.org/GNOME/libxml2/-/commit/3e1aad4fe584747fd7d17cc7b2863a78e2d21a77

I'll apply this downstream in Nokogiri and cut a release for you.

@flavorjones
Copy link
Member

Assuming CI goes green shortly on the v1.11.x branch (https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri-v1.11.x), I'll cut a release tonight.

@flavorjones
Copy link
Member

Shipped Release 1.11.7 / 2021-06-02 · sparklemotion/nokogiri

@yedennek
Copy link
Author

yedennek commented Jun 3, 2021

That's amazing, thank you so much for the quick response! 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants