Fix recheck of failures whith caching #556

riccardoporreca · 2020-02-09T21:07:13Z

According to the README

Links that were failures are kept in the cache and always rechecked.

However, a few things do not work as expected, as shown in the following minimal example

mkdir my-jekyll-website
cd my-jekyll-website
bundle init
bundle install --path vendor/bundle
bundle add jekyll
bundle exec jekyll new --force --skip-bundle .
bundle add html-proofer
bundle install
echo "[foo.bar](https://fooooo.baaaar)" >> about.markdown
echo "[foo.bar](https://foo.baaaaar/)" >> about.markdown
echo "[foo.bar](https://www.github.com/foo.bar.foooooo)" >> about.markdown
bundle exec jekyll build
bundle exec htmlproofer --log-level :debug _site --timeframe 1h
bundle exec htmlproofer --log-level :debug _site --timeframe 1h

with the relevant part of the log from the 2nd call being

Adding https://fooooo.baaaar to cache check
Adding 1 link to the cache...
Removing https://fooooo.baaaar/ from cache check
Removing 1 link from the cache...
Checking 1 external link...

There are two issues here:

URLs with failures are actually not rechecked.
For the case of https://fooooo.baaaar, the URL is actually removed and added due to an issue with the trailing /.

This pull request concerns mainly point 1, which I believe is a genuine issue with

html-proofer/lib/html-proofer/cache.rb

Lines 121 to 125 in 65f7e99

    
           if within_timeframe?(cache['time']) 
        
             next if cache['message'].empty? # these were successes to skip 
        
           else 
        
             urls_to_check[url] = cache['filenames'] # recheck expired links 
        
           end

Since I noticed that unit tests existed and were supposed to cover such case, I actually did a preliminary cleanup of a few cache-related tests, where the existing logs and the time traveling were often not consistent. See 821ff0f (details in the commit message) where the test about re-checking failures is broken as in the example above, with URLs not being re-checked.
The actual fix is committed in d0016ca (+ few minor later commits).

Just a small disclaimer: I really like html-proofer and was happy to investigate the issue and contribute (also being familiar with unit testing and mocking), despite being pretty new to Ruby. Hope I did not overlook (too many) things.

As for the point 2, this is possibly related to effective_url being used instead of href when calling handle_failure (which eventually passes it to @cache.add)

html-proofer/lib/html-proofer/url_validator.rb

Lines 166 to 167 in 65f7e99

    
             handle_failure(effective_url, filenames, response_code, response.return_message) 
        
           elsif method == :head

Not sure if there was a reason for preferring effective_url in this context, if not I can include in this PR switching to href (I would also have an uncommitted test for this).

* Rely on mocking :within_timeframe? instead of time travel (previous dates were not consistent) and fix expectations on :add. * Also fix the log for the recheck failure test: it did not contain the actual URL, which was then of added as new URL => This reveals a bug in handling failure re-checks.

codecov-io · 2020-02-09T21:11:57Z

Codecov Report

❗ No coverage uploaded for pull request base (master@1dce201). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master     #556   +/-   ##
=========================================
  Coverage          ?   98.17%           
=========================================
  Files             ?       30           
  Lines             ?     1969           
  Branches          ?        0           
=========================================
  Hits              ?     1933           
  Misses            ?       36           
  Partials          ?        0

Impacted Files	Coverage Δ
lib/html-proofer/cache.rb	`96.73% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dce201...29ed4e7. Read the comment docs.

riccardoporreca · 2020-02-11T11:16:03Z

I could have checked this already, but this misbehavior was apparently introduced recently (Nov 2019) with 85b67a0#diff-fe945704107273e5238902a425c3d386, and the logic was correct before then

gjtorikian · 2020-02-11T12:30:38Z

Hey, thanks for this! I really appreciate the careful PR.

I have made a few slight changes to address the tests for issue 1; I thought there was a bit too much stubbing of within_timeframe? and would prefer just the TImecop time traveling to assert that methods were (or were not) being called.

I have to think a little bit more on point 2, primarily, what I was thinking at the time around it.

riccardoporreca · 2020-02-11T15:57:21Z

Sure, time traveling is indeed more explicit about the cases we are assessing, and with consistent dates / logs the tests work equally well.

riccardoporreca · 2020-04-20T15:41:17Z

@gjtorikian, does it make sense to start merging the fix currently in places for the major point 1 and defer further thoughts about point 2 to another dedicated issue?

No being able to use caching due errors not being rechecked is quite a limitation. Would be cool to get this part fixed in a new upcoming release ;).

gjtorikian · 2020-04-20T15:59:04Z

Yeah, absolutely. Sorry I forgot about this one. I'll push a new patch release once the tests are green.

gjtorikian · 2020-04-20T16:05:42Z

Test is just Rubocop; this is out as 3.15.3. Thanks again!

riccardoporreca · 2020-04-20T16:20:06Z

🎉 Great, thanks for the quick action! 🚀

riccardoporreca added 4 commits February 7, 2020 01:22

Fix recheck of failures.

d0016ca

No extra empty line detected at block body beginning/end.

e8f81dd

minor - improve comments

29ed4e7

riccardoporreca marked this pull request as ready for review February 9, 2020 21:09

riccardoporreca requested a review from fulldecent as a code owner February 9, 2020 21:09

Provide Timecop updates

Unverified

This user has not yet uploaded their public signing key.

GPG key ID: 5BAAE86264ABDE0A

Learn about vigilant mode

5550450

Merge branch 'master' into fix-recheck-failure

Unverified

This user has not yet uploaded their public signing key.

GPG key ID: 5BAAE86264ABDE0A

Learn about vigilant mode

cd89a8a

gjtorikian merged commit 90b0925 into gjtorikian:master Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix recheck of failures whith caching #556

Fix recheck of failures whith caching #556

riccardoporreca commented Feb 9, 2020

Uh oh!

codecov-io commented Feb 9, 2020 •

edited

Loading

Uh oh!

riccardoporreca commented Feb 11, 2020

Uh oh!

gjtorikian commented Feb 11, 2020

Uh oh!

riccardoporreca commented Feb 11, 2020

Uh oh!

riccardoporreca commented Apr 20, 2020

Uh oh!

gjtorikian commented Apr 20, 2020

Uh oh!

gjtorikian commented Apr 20, 2020

Uh oh!

riccardoporreca commented Apr 20, 2020

Uh oh!

	if within_timeframe?(cache['time'])
	next if cache['message'].empty? # these were successes to skip
	else
	urls_to_check[url] = cache['filenames'] # recheck expired links
	end

	handle_failure(effective_url, filenames, response_code, response.return_message)
	elsif method == :head

Fix recheck of failures whith caching #556

Fix recheck of failures whith caching #556

Conversation

riccardoporreca commented Feb 9, 2020

Uh oh!

codecov-io commented Feb 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

riccardoporreca commented Feb 11, 2020

Uh oh!

gjtorikian commented Feb 11, 2020

Uh oh!

riccardoporreca commented Feb 11, 2020

Uh oh!

riccardoporreca commented Apr 20, 2020

Uh oh!

gjtorikian commented Apr 20, 2020

Uh oh!

gjtorikian commented Apr 20, 2020

Uh oh!

riccardoporreca commented Apr 20, 2020

Uh oh!

codecov-io commented Feb 9, 2020 •

edited

Loading