Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations from hack in document_fragment #2087

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -15,6 +15,7 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA

### Improved

* Reduce the number of object allocations needed when parsing an HTML::DocumentFragment. [[#2087](https://github.com/sparklemotion/nokogiri/issues/2087)] (Thanks, [@ashmaroli](https://github.com/ashmaroli)!)
* [JRuby] Update the algorithm used to calculate `Node#line` to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [[#1223](https://github.com/sparklemotion/nokogiri/issues/1223)]


Expand Down
30 changes: 15 additions & 15 deletions lib/nokogiri/html/document_fragment.rb
Expand Up @@ -4,26 +4,26 @@ module HTML
class DocumentFragment < Nokogiri::XML::DocumentFragment
####
# Create a Nokogiri::XML::DocumentFragment from +tags+, using +encoding+
def self.parse tags, encoding = nil
def self.parse(tags, encoding = nil)
doc = HTML::Document.new

encoding ||= if tags.respond_to?(:encoding)
encoding = tags.encoding
if encoding == ::Encoding::ASCII_8BIT
'UTF-8'
else
encoding.name
end
else
'UTF-8'
end
encoding = tags.encoding
if encoding == ::Encoding::ASCII_8BIT
'UTF-8'
else
encoding.name
end
else
'UTF-8'
end

doc.encoding = encoding

new(doc, tags)
end

def initialize document, tags = nil, ctx = nil
def initialize(document, tags = nil, ctx = nil)
return self unless tags

if ctx
Expand All @@ -33,13 +33,13 @@ def initialize document, tags = nil, ctx = nil
self.errors = document.errors - preexisting_errors
else
# This is a horrible hack, but I don't care
if tags.strip =~ /^<body/i
path = "/html/body"
path = if /^\s*?<body/i.match?(tags)
"/html/body"
else
path = "/html/body/node()"
"/html/body/node()"
end

temp_doc = HTML::Document.parse "<html><body>#{tags}", nil, document.encoding
temp_doc = HTML::Document.parse("<html><body>#{tags}", nil, document.encoding)
temp_doc.xpath(path).each { |child| child.parent = self }
self.errors = temp_doc.errors
end
Expand Down