You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently upgraded from HTML-Proofer 3 to 5, and noticed that it was taking an exorbitant length of time to run on a site that had previously been fine.
I eventually traced the issue to a page with a bookmarklet on it; a long javascript:… URL with various URL-encoded entities – trying to check that URL was taking a very long time (multiple minutes).
Steps to reproduce
Here's a short program that tries to lint an HTML file with the problematic link:
require'html-proofer'File.open('bookmarklet.html','w')do |f|
f.puts'<a href="javascript:var%20ids%20=%20[%22omnibox%22,%20%22cards%22,%20%22welcome%22];var%20classes%20=%20[%22widget-viewcard%22,%20%22widget-zoom%22,%20%22watermark%22];var%20hidden%20=%20(window.getComputedStyle(document.getElementById(ids[0]))).getPropertyValue(%22display%22);if%20(hidden%20!==%20%22none%22)%20{var%20disp%20=%20%22none%22;}%20else%20{var%20disp%20=%20%22%22;}for%20(var%20i%20=%200;%20i%20<%20ids.length;%20i++)%20{document.getElementById(ids[i]).style.display%20=%20disp;}for%20(var%20i%20=%200;%20i%20<%20classes.length;%20i++)%20{var%20div%20=%20document.getElementsByClassName(classes[i]);for%20(var%20j%20=%200;%20j%20<%20div.length;%20j++)%20{div[j].style.display%20=%20disp;}}">Toggle Google Maps</a>'endHTMLProofer.check_file('bookmarklet.html',{checks: ['Links'],log_level: :debug}).run
The output I see is:
$ ruby repro.rbRunning 1 check (Links) in bookmarklet.html on *.html files ...Running Links in bookmarklet.html
but it never actually completes.
I ran this with the latest version of HTML-Proofer and Ruby:
I added a bunch of print statements to determine where it was going and where it was getting stuck. This stuff may be obvious to people who know the library, but this is a new codebase to me:
If you follow the chain of function calls, create_element(node) calls Element.new(@runner, node, base_url: base_url), and following that down I found the issue is somewhere in Attribute::Url – it gets stuck on this line:
This immediately starts consuming 100% of my Mac's CPU, and has been since I started writing this issue. It has yet to complete.
Suggested fix
This regex first appeared in #394. I'm not sure what it's meant to be doing.
I'm also not sure what clean_url! is for; I wonder if it should be doing anything with javascript: URLs? Is there anything HTML-Proofer can do with them downstream? If not, maybe it should just skip any URL starting javascript: here?
Notes
This problem occurs even if I add the data-proofer-ignore attribute to my <a> tag – that was mildly surprising to me.
The text was updated successfully, but these errors were encountered:
Hi, my apologies for not seeing this sooner. That regexp was, in fact, doing nothing, and was probably vestigial. Removing it in #820 still lets the test suite pass, so I'll release this fix in the next patch release this week.
This problem occurs even if I add the data-proofer-ignore attribute to my <a> tag – that was mildly surprising to me.
Hm, I tried to recreate this and couldn't. If you have another test case after the fix above is merged, I'll take a look again.
I recently upgraded from HTML-Proofer 3 to 5, and noticed that it was taking an exorbitant length of time to run on a site that had previously been fine.
I eventually traced the issue to a page with a bookmarklet on it; a long
javascript:…
URL with various URL-encoded entities – trying to check that URL was taking a very long time (multiple minutes).Steps to reproduce
Here's a short program that tries to lint an HTML file with the problematic link:
The output I see is:
but it never actually completes.
I ran this with the latest version of HTML-Proofer and Ruby:
The output I see is:
Debugging analysis
I added a bunch of print statements to determine where it was going and where it was getting stuck. This stuff may be obvious to people who know the library, but this is a new codebase to me:
Here's where that log message comes from:
html-proofer/lib/html_proofer/runner.rb
Lines 130 to 134 in 1c8b2c1
In this case
check
is a link check. Follow that down a level, I found it stalled on this line:html-proofer/lib/html_proofer/check/links.rb
Line 8 in 1c8b2c1
Here
node
is the<a>
tag, I think?If you follow the chain of function calls,
create_element(node)
callsElement.new(@runner, node, base_url: base_url)
, and following that down I found the issue is somewhere inAttribute::Url
– it gets stuck on this line:html-proofer/lib/html_proofer/element.rb
Line 19 in 1c8b2c1
And if you go and look at how
Attribute::Url
works, you see that the initialiser callsclean_url!
, and that turns out to be the root of the slowness:html-proofer/lib/html_proofer/attribute/url.rb
Lines 223 to 228 in 1c8b2c1
We can verify this by running my URL against the regex as a standalone program, and observing that it takes a very long time:
This immediately starts consuming 100% of my Mac's CPU, and has been since I started writing this issue. It has yet to complete.
Suggested fix
This regex first appeared in #394. I'm not sure what it's meant to be doing.
I'm also not sure what
clean_url!
is for; I wonder if it should be doing anything withjavascript:
URLs? Is there anything HTML-Proofer can do with them downstream? If not, maybe it should just skip any URL startingjavascript:
here?Notes
This problem occurs even if I add the
data-proofer-ignore
attribute to my<a>
tag – that was mildly surprising to me.The text was updated successfully, but these errors were encountered: