Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve encoding reader performance (backport to v1.13.x) #2510

Merged
merged 2 commits into from Apr 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/nokogiri/html4/document.rb
Expand Up @@ -268,7 +268,7 @@ def start_element(name, attrs = [])
end

def self.detect_encoding(chunk)
(m = chunk.match(/\A(<\?xml[ \t\r\n]+[^>]*>)/)) &&
(m = chunk.match(/\A(<\?xml[ \t\r\n][^>]*>)/)) &&
(return Nokogiri.XML(m[1]).encoding)

if Nokogiri.jruby?
Expand Down
2 changes: 1 addition & 1 deletion lib/nokogiri/xml/node.rb
Expand Up @@ -123,7 +123,7 @@ class Node
# [Yields] Nokogiri::XML::Node
# [Returns] Nokogiri::XML::Node
#
def initialize(name, document)
def initialize(name, document) # rubocop:disable Style/RedundantInitialize
# This is intentionally empty.
end

Expand Down
2 changes: 1 addition & 1 deletion lib/nokogiri/xml/processing_instruction.rb
Expand Up @@ -3,7 +3,7 @@
module Nokogiri
module XML
class ProcessingInstruction < Node
def initialize(document, name, content)
def initialize(document, name, content) # rubocop:disable Style/RedundantInitialize
end
end
end
Expand Down
12 changes: 12 additions & 0 deletions test/html4/test_document_encoding.rb
Expand Up @@ -155,6 +155,18 @@ def binopen(file)
end
end
end

it "does not start backtracking during detection of XHTML encoding" do
# this test is a quick and dirty version
# of the more complete perf test that is on main.
n = 40_000
redos_string = "<?xml " + (" " * n)
redos_string.encode!("ASCII-8BIT")
start_time = Time.now
Nokogiri::HTML4(redos_string)
elapsed_time = Time.now - start_time
assert_operator(elapsed_time, :<, 1)
end
end
end
end
Expand Down