Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Nokogiri::HTML::DocumentFragment#path causes a segmentation fault #2250

Closed
cbeer opened this issue May 25, 2021 · 4 comments · Fixed by #2251 or #2252
Closed

[bug] Nokogiri::HTML::DocumentFragment#path causes a segmentation fault #2250

cbeer opened this issue May 25, 2021 · 4 comments · Fixed by #2251 or #2252

Comments

@cbeer
Copy link

cbeer commented May 25, 2021

Please describe the bug

Nokogiri 1.11.4+ segfaults when you call path on a Nokogiri::HTML::DocumentFragment.


a.rb:14: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]

-- Control frame information -----------------------------------------------
c:0024 p:---- s:0131 e:000130 CFUNC  :path
c:0023 p:0031 s:0127 e:000124 BLOCK  a.rb:14 [FINISH]
....

Help us reproduce what you're seeing

#! /usr/bin/env ruby

require 'nokogiri'
require 'minitest/autorun'

class Test < MiniTest::Spec
  describe "Nokogiri::HTML::DocumentFragment#path" do
    it "should return the expected value" do
      html = <<~HEREDOC
        <div></div>
      HEREDOC

      fragment = Nokogiri::HTML.fragment(html)
      assert_equal "?", fragment.path
    end
  end
end

Expected behavior

Nokogiri 1.11.3 and below returned ? (as does Nokogiri::XML::DocumentFragment#path), but 1.11.4 and 1.11.5 both segfault.

Environment

# Nokogiri (1.11.5)
    ---
    warnings: []
    nokogiri:
      version: 1.11.5
      cppflags:
      - "-I/Users/cabeer/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.5/ext/nokogiri"
      - "-I/Users/cabeer/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.5/ext/nokogiri/include"
      - "-I/Users/cabeer/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.5/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 3.0.0
      platform: x86_64-darwin19
      gem_platform: x86_64-darwin-19
      description: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
      engine: ruby
    libxml:
      source: packaged
      precompiled: false
      patches:
      - 0001-Remove-script-macro-support.patch
      - 0002-Update-entities-to-remove-handling-of-ssi.patch
      - 0003-libxml2.la-is-in-top_builddir.patch
      - 0004-use-glibc-strlen.patch
      - 0005-avoid-isnan-isinf.patch
      - 0006-update-automake-files-for-arm64.patch
      libxml2_path: "/Users/cabeer/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.5/ext/nokogiri"
      memory_management: ruby
      iconv_enabled: true
      compiled: 2.9.12
      loaded: 2.9.12
    libxslt:
      source: packaged
      precompiled: false
      patches:
      - 0001-update-automake-files-for-arm64.patch
      - 0002-Fix-xml2-config-check-in-configure-script.patch
      compiled: 1.1.34
      loaded: 1.1.34
    other_libraries: {}

Additional context

Initially reported to teamcapybara/capybara#2466.

@cbeer cbeer added the state/needs-triage Inbox for non-installation-related bug reports or help requests label May 25, 2021
@flavorjones
Copy link
Member

Thanks for opening this issue. I'll take a look shortly!

@flavorjones
Copy link
Member

Diagnosis

I've git-bisected the introduction of this issue to libxml2@e20c9c1, which causes xmlGetNodePath to return NULL, and we're not properly checking for failure when we call it in xml_node.c.

History

The ? return value is an accident of history -- the original implementation of xmlGetNodePath dates back to the ancestor of that function, xmlShellPwd, which was written in 1999 as part of libxml2's debugXML.c REPL for interactively browsing a DOM.

In a REPL prompt, showing a question mark in error cases might make sense; but not for a function that is supposed to return a valid XPath query.

What's the right behavior?

Which brings me to my question for you: What do you expect this method to return for a DocumentFragment? For Nodes, it returns valid XPath; for Documents, it returns "/". In both of those cases, querying with the returned string returns the object itself, so:

  assert_equal(xml_document, xml_document.at_xpath(xml_document.path))
  assert_equal(xml_node,     xml_document.at_xpath(xml_node.path))

and so maybe the answer is "it doesn't matter, as long as we can get back the DocumentFragment ...

But then that gets more complicated because a DocumentFragment is not something the the XPath spec recognizes, or that libxml2 can return valid XPath for. This is (partly) the reason that XPath queries within DocumentFragments are so broken (see the DocumentFragment portion of ROADMAP.md). We'd need to hack it some more to make the round-trip frag.at_xpath(frag.path) work properly.

What's the "right-now" behavior?

Would you be open to me fixing this to just return "?" as before in this case (in the v1.11.x release series), to avoid breaking anybody relying on this behavior; and then we can look into fixing this For Real™ in a future minor or major release?

@flavorjones
Copy link
Member

#2251 is a PR that will restore the "?" behavior, and #2252 is the same fix for the v1.11.x branch (because I want to fix this in a patch release).

@flavorjones flavorjones added this to the v1.11.x patch releases milestone May 26, 2021
@flavorjones flavorjones added upstream/libxml2 and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels May 26, 2021
@flavorjones flavorjones linked a pull request May 26, 2021 that will close this issue
@flavorjones
Copy link
Member

Fixed in v1.11.6, just released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants