Skip to content

Commit

Permalink
Merge pull request #2510 from sparklemotion/flavorjones-encoding-read…
Browse files Browse the repository at this point in the history
…er-performance-v1.13.x

improve encoding reader performance (backport to v1.13.x)
  • Loading branch information
flavorjones committed Apr 11, 2022
2 parents b848031 + e444525 commit 6a20ee4
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 1 deletion.
2 changes: 1 addition & 1 deletion lib/nokogiri/html4/document.rb
Expand Up @@ -268,7 +268,7 @@ def start_element(name, attrs = [])
end

def self.detect_encoding(chunk)
(m = chunk.match(/\A(<\?xml[ \t\r\n]+[^>]*>)/)) &&
(m = chunk.match(/\A(<\?xml[ \t\r\n][^>]*>)/)) &&
(return Nokogiri.XML(m[1]).encoding)

if Nokogiri.jruby?
Expand Down
12 changes: 12 additions & 0 deletions test/html4/test_document_encoding.rb
Expand Up @@ -155,6 +155,18 @@ def binopen(file)
end
end
end

it "does not start backtracking during detection of XHTML encoding" do
# this test is a quick and dirty version
# of the more complete perf test that is on main.
n = 40_000
redos_string = "<?xml " + (" " * n)
redos_string.encode!("ASCII-8BIT")
start_time = Time.now
Nokogiri::HTML4(redos_string)
elapsed_time = Time.now - start_time
assert_operator(elapsed_time, :<, 1)
end
end
end
end
Expand Down

0 comments on commit 6a20ee4

Please sign in to comment.