Skip to content

Commit

Permalink
Merge pull request #2388 from sparklemotion/1158-use-the-right-classes
Browse files Browse the repository at this point in the history
use the correct classes for fragment parsing
  • Loading branch information
flavorjones committed Dec 20, 2021
2 parents 66c15d9 + 38c2f16 commit 627a97d
Show file tree
Hide file tree
Showing 9 changed files with 113 additions and 17 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -24,6 +24,7 @@ A related discussion about Trust exists at [#2357](https://github.com/sparklemot
### Fixed

* XML::Builder blocks restore context properly when exceptions are raised. [[#2372](https://github.com/sparklemotion/nokogiri/issues/2372)] (Thanks, [@ric2b](https://github.com/ric2b) and [@rinthedev](https://github.com/rinthedev)!)
* Error recovery from in-context parsing (e.g., `Node#parse`) now always uses the correct `DocumentFragment` class. Previously `Nokogiri::HTML4::DocumentFragment` was always used, even for XML documents. [[#1158](https://github.com/sparklemotion/nokogiri/issues/1158)]
* [CRuby] Fix memory leak in `Document#canonicalize` when inclusive namespaces are passed in. [[#2345](https://github.com/sparklemotion/nokogiri/issues/2345)]
* [CRuby] Fix memory leak in `Document#canonicalize` when an argument type error is raised. [[#2345](https://github.com/sparklemotion/nokogiri/issues/2345)]
* [CRuby] Fix memory leak in `EncodingHandler` where iconv handlers were not being cleaned up. [[#2345](https://github.com/sparklemotion/nokogiri/issues/2345)]
Expand Down
1 change: 1 addition & 0 deletions lib/nokogiri.rb
Expand Up @@ -115,6 +115,7 @@ def Nokogiri(*args, &block)
end

require_relative "nokogiri/version"
require_relative "nokogiri/class_resolver"
require_relative "nokogiri/syntax_error"
require_relative "nokogiri/xml"
require_relative "nokogiri/xslt"
Expand Down
67 changes: 67 additions & 0 deletions lib/nokogiri/class_resolver.rb
@@ -0,0 +1,67 @@
# coding: utf-8
# frozen_string_literal: true

require "set"

module Nokogiri
#
# Some classes in Nokogiri are namespaced as a group, for example
# Document, DocumentFragment, and Builder.
#
# It's sometimes necessary to look up the related class, e.g.:
#
# XML::Builder → XML::Document
# HTML4::Builder → HTML4::Document
# HTML5::Document → HTML5::DocumentFragment
#
# This module is included into those key classes who need to do this.
#
module ClassResolver
# #related_class restricts matching namespaces to those matching this set.
VALID_NAMESPACES = Set.new(["HTML", "HTML4", "HTML5", "XML"])

# :call-seq:
# related_class(class_name) → Class
#
# Find a class constant within the
#
# Some examples:
#
# Nokogiri::XML::Document.new.related_class("DocumentFragment")
# # => Nokogiri::XML::DocumentFragment
# Nokogiri::HTML4::Document.new.related_class("DocumentFragment")
# # => Nokogiri::HTML4::DocumentFragment
#
# Note this will also work for subclasses that follow the same convention, e.g.:
#
# Loofah::HTML::Document.new.related_class("DocumentFragment")
# # => Loofah::HTML::DocumentFragment
#
# And even if it's a subclass, this will iterate through the superclasses:
#
# class ThisIsATopLevelClass < Nokogiri::HTML4::Builder ; end
# ThisIsATopLevelClass.new.related_class("Document")
# # => Nokogiri::HTML4::Document
#
def related_class(class_name)
klass = nil
inspecting = self.class

while inspecting
namespace_path = inspecting.name.split("::")[0..-2]
inspecting = inspecting.superclass

next unless VALID_NAMESPACES.include?(namespace_path.last)

related_class_name = (namespace_path << class_name).join("::")
klass = begin
Object.const_get(related_class_name)
rescue NameError
nil
end
break if klass
end
klass
end
end
end
10 changes: 3 additions & 7 deletions lib/nokogiri/xml/builder.rb
Expand Up @@ -262,6 +262,8 @@ module XML
# </root>
#
class Builder
include Nokogiri::ClassResolver

DEFAULT_DOCUMENT_OPTIONS = { namespace_inheritance: true }

# The current Document object being built
Expand Down Expand Up @@ -307,13 +309,7 @@ def initialize(options = {}, root = nil, &block)
@doc = root.document
@parent = root
else
klassname = "::" + (self.class.name.split("::")[0..-2] + ["Document"]).join("::")
klass = begin
Object.const_get(klassname)
rescue NameError
Nokogiri::XML::Document
end
@parent = @doc = klass.new
@parent = @doc = related_class("Document").new
end

@context = nil
Expand Down
6 changes: 3 additions & 3 deletions lib/nokogiri/xml/node.rb
Expand Up @@ -56,6 +56,7 @@ module XML
class Node
include Nokogiri::XML::PP::Node
include Nokogiri::XML::Searchable
include Nokogiri::ClassResolver
include Enumerable

# Element node type, see Nokogiri::XML::Node#element?
Expand Down Expand Up @@ -936,8 +937,7 @@ def matches?(selector)
# Create a DocumentFragment containing +tags+ that is relative to _this_
# context node.
def fragment(tags)
type = document.html? ? Nokogiri::HTML : Nokogiri::XML
type::DocumentFragment.new(document, tags, self)
document.related_class("DocumentFragment").new(document, tags, self)
end

###
Expand Down Expand Up @@ -988,7 +988,7 @@ def parse(string_or_io, options = nil)
node_set = in_context(contents, options.to_i)
if node_set.empty? && (document.errors.length > error_count)
if options.recover?
fragment = Nokogiri::HTML4::DocumentFragment.parse(contents)
fragment = document.related_class("DocumentFragment").parse(contents)
node_set = fragment.children
else
raise document.errors[error_count]
Expand Down
1 change: 1 addition & 0 deletions nokogiri.gemspec
Expand Up @@ -232,6 +232,7 @@ Gem::Specification.new do |spec|
"lib/nekodtd.jar",
"lib/nekohtml.jar",
"lib/nokogiri.rb",
"lib/nokogiri/class_resolver.rb",
"lib/nokogiri/css.rb",
"lib/nokogiri/css/node.rb",
"lib/nokogiri/css/parser.rb",
Expand Down
7 changes: 5 additions & 2 deletions rakelib/check-manifest.rake
Expand Up @@ -18,6 +18,7 @@ task :check_manifest do
coverage
doc
gems
misc
nokogumbo-import
oci-images
patches
Expand All @@ -38,6 +39,8 @@ task :check_manifest do
.gitignore
.gitmodules
.yardopts
.rubocop.yml
.rubocop_todo.yml
CHANGELOG.md
CODE_OF_CONDUCT.md
CONTRIBUTING.md
Expand Down Expand Up @@ -76,10 +79,10 @@ task :check_manifest do

unless missing_files.empty?
puts "missing:"
missing_files.each { |f| puts "- #{f}" }
missing_files.sort.each { |f| puts "- #{f}" }
end
unless extra_files.empty?
puts "unexpected:"
extra_files.each { |f| puts "+ #{f}" }
extra_files.sort.each { |f| puts "+ #{f}" }
end
end
11 changes: 9 additions & 2 deletions test/xml/test_builder.rb
Expand Up @@ -424,11 +424,14 @@ def test_builder_uses_proper_document_class
xml_builder = Nokogiri::XML::Builder.new
assert_instance_of(Nokogiri::XML::Document, xml_builder.doc)

html_builder = Nokogiri::HTML::Builder.new
assert_instance_of(Nokogiri::HTML::Document, html_builder.doc)
html_builder = Nokogiri::HTML4::Builder.new
assert_instance_of(Nokogiri::HTML4::Document, html_builder.doc)

foo_builder = ThisIsATestBuilder.new
assert_instance_of(Nokogiri::XML::Document, foo_builder.doc)

foo_builder = ThisIsAnotherTestBuilder.new
assert_instance_of(Nokogiri::HTML4::Document, foo_builder.doc)
end

private
Expand All @@ -443,3 +446,7 @@ def namespaces_defined_on(node)
class ThisIsATestBuilder < Nokogiri::XML::Builder
# this exists for the test_builder_uses_proper_document_class and should be empty
end

class ThisIsAnotherTestBuilder < Nokogiri::HTML4::Builder
# this exists for the test_builder_uses_proper_document_class and should be empty
end
26 changes: 23 additions & 3 deletions test/xml/test_node.rb
Expand Up @@ -91,33 +91,45 @@ def test_parse_config_option
end

def test_node_context_parsing_of_malformed_html_fragment
doc = HTML.parse("<html><body><div></div></body></html>")
doc = HTML4.parse("<html><body><div></div></body></html>")
context_node = doc.at_css("div")
nodeset = context_node.parse("<div </div>")

assert_equal(1, doc.errors.length)
assert_equal(1, nodeset.length)
assert_equal("<div></div>", nodeset.to_s)
assert_instance_of(Nokogiri::HTML4::Document, nodeset.document)
assert_instance_of(Nokogiri::HTML4::Document, nodeset.first.document)
end

def test_node_context_parsing_of_malformed_html_fragment_with_recover_is_corrected
doc = HTML.parse("<html><body><div></div></body></html>")
doc = HTML4.parse("<html><body><div></div></body></html>")
context_node = doc.at_css("div")
nodeset = context_node.parse("<div </div>", &:recover)

assert_equal(1, doc.errors.length)
assert_equal(1, nodeset.length)
assert_equal("<div></div>", nodeset.to_s)
assert_instance_of(Nokogiri::HTML4::Document, nodeset.document)
assert_instance_of(Nokogiri::HTML4::Document, nodeset.first.document)
end

def test_node_context_parsing_of_malformed_html_fragment_without_recover_is_not_corrected
doc = HTML.parse("<html><body><div></div></body></html>")
doc = HTML4.parse("<html><body><div></div></body></html>")
context_node = doc.at_css("div")
assert_raises(Nokogiri::XML::SyntaxError) do
context_node.parse("<div </div>", &:strict)
end
end

def test_node_context_parsing_of_malformed_xml_fragment_uses_the_right_class_to_recover
doc = XML.parse("<root><body><div></div></body></root>")
context_node = doc.at_css("div")
nodeset = context_node.parse("<div </div") # causes an error and recovers
assert_instance_of(Nokogiri::XML::Document, nodeset.document)
assert_instance_of(Nokogiri::XML::Document, nodeset.first.document)
end

def test_parse_error_list
error_count = xml.errors.length
xml.root.parse("<hello>")
Expand Down Expand Up @@ -421,6 +433,14 @@ def test_duplicate_node_removes_namespace
end
end

def test_fragment_creates_appropriate_class
frag = Nokogiri.XML("<root><child/></root>").at_css("child").fragment("<thing/>")
assert_instance_of(Nokogiri::XML::DocumentFragment, frag)

frag = Nokogiri.HTML4("<root><child/></root>").at_css("child").fragment("<thing/>")
assert_instance_of(Nokogiri::HTML4::DocumentFragment, frag)
end

def test_fragment_creates_elements
apple = xml.fragment("<Apple/>")
apple.children.each do |child|
Expand Down

0 comments on commit 627a97d

Please sign in to comment.