Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recheck of failures whith caching #556

Merged
merged 6 commits into from Apr 20, 2020

Conversation

riccardoporreca
Copy link
Collaborator

According to the README

Links that were failures are kept in the cache and always rechecked.

However, a few things do not work as expected, as shown in the following minimal example

mkdir my-jekyll-website
cd my-jekyll-website
bundle init
bundle install --path vendor/bundle
bundle add jekyll
bundle exec jekyll new --force --skip-bundle .
bundle add html-proofer
bundle install
echo "[foo.bar](https://fooooo.baaaar)" >> about.markdown
echo "[foo.bar](https://foo.baaaaar/)" >> about.markdown
echo "[foo.bar](https://www.github.com/foo.bar.foooooo)" >> about.markdown
bundle exec jekyll build
bundle exec htmlproofer --log-level :debug _site --timeframe 1h
bundle exec htmlproofer --log-level :debug _site --timeframe 1h

with the relevant part of the log from the 2nd call being

Adding https://fooooo.baaaar to cache check
Adding 1 link to the cache...
Removing https://fooooo.baaaar/ from cache check
Removing 1 link from the cache...
Checking 1 external link...

There are two issues here:

  1. URLs with failures are actually not rechecked.
  2. For the case of https://fooooo.baaaar, the URL is actually removed and added due to an issue with the trailing /.

This pull request concerns mainly point 1, which I believe is a genuine issue with

if within_timeframe?(cache['time'])
next if cache['message'].empty? # these were successes to skip
else
urls_to_check[url] = cache['filenames'] # recheck expired links
end

Since I noticed that unit tests existed and were supposed to cover such case, I actually did a preliminary cleanup of a few cache-related tests, where the existing logs and the time traveling were often not consistent. See 821ff0f (details in the commit message) where the test about re-checking failures is broken as in the example above, with URLs not being re-checked.
The actual fix is committed in d0016ca (+ few minor later commits).

Just a small disclaimer: I really like html-proofer and was happy to investigate the issue and contribute (also being familiar with unit testing and mocking), despite being pretty new to Ruby. Hope I did not overlook (too many) things.

As for the point 2, this is possibly related to effective_url being used instead of href when calling handle_failure (which eventually passes it to @cache.add)

handle_failure(effective_url, filenames, response_code, response.return_message)
elsif method == :head

Not sure if there was a reason for preferring effective_url in this context, if not I can include in this PR switching to href (I would also have an uncommitted test for this).

* Rely on mocking :within_timeframe? instead of time travel (previous dates were not consistent) and fix expectations on :add.
* Also fix the log for the recheck failure test: it did not contain the actual URL, which was then of added as new URL => This reveals a bug in handling failure re-checks.
@riccardoporreca riccardoporreca marked this pull request as ready for review February 9, 2020 21:09
@codecov-io
Copy link

codecov-io commented Feb 9, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@1dce201). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #556   +/-   ##
=========================================
  Coverage          ?   98.17%           
=========================================
  Files             ?       30           
  Lines             ?     1969           
  Branches          ?        0           
=========================================
  Hits              ?     1933           
  Misses            ?       36           
  Partials          ?        0
Impacted Files Coverage Δ
lib/html-proofer/cache.rb 96.73% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dce201...29ed4e7. Read the comment docs.

@riccardoporreca
Copy link
Collaborator Author

I could have checked this already, but this misbehavior was apparently introduced recently (Nov 2019) with 85b67a0#diff-fe945704107273e5238902a425c3d386, and the logic was correct before then

@gjtorikian
Copy link
Owner

Hey, thanks for this! I really appreciate the careful PR.

I have made a few slight changes to address the tests for issue 1; I thought there was a bit too much stubbing of within_timeframe? and would prefer just the TImecop time traveling to assert that methods were (or were not) being called.

I have to think a little bit more on point 2, primarily, what I was thinking at the time around it.

@riccardoporreca
Copy link
Collaborator Author

Sure, time traveling is indeed more explicit about the cases we are assessing, and with consistent dates / logs the tests work equally well.

@riccardoporreca
Copy link
Collaborator Author

@gjtorikian, does it make sense to start merging the fix currently in places for the major point 1 and defer further thoughts about point 2 to another dedicated issue?

No being able to use caching due errors not being rechecked is quite a limitation. Would be cool to get this part fixed in a new upcoming release ;).

@gjtorikian
Copy link
Owner

Yeah, absolutely. Sorry I forgot about this one. I'll push a new patch release once the tests are green.

@gjtorikian
Copy link
Owner

Test is just Rubocop; this is out as 3.15.3. Thanks again!

@gjtorikian gjtorikian merged commit 90b0925 into gjtorikian:master Apr 20, 2020
@riccardoporreca
Copy link
Collaborator Author

🎉 Great, thanks for the quick action! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants