New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate rubocop_checksum by caculating a crc32 for each file rather than using mtime #8633
Conversation
I'm on the fence about this, as I'm always wary to willingly impact the performance. I'm thinking that one middle ground would be to be able to specify the caching strategy with a command-line flag or a configuration option. How does this sound? //cc @p0deje |
I didn't realize we made a digest of the executable. I'm not familiar with this code, so I might be off track here, but when we are run with
This way, a user using only official gem releases would have near instant checksum calculation (only I mean, it's a bit of a shame that users running RuboCop vX.Y.Z have to recalculate the checksum for the whole source, when "X.Y.Z" is a sufficient checksum. |
@bbatsov I totally agree about purposely making things less performant, but I think the tradeoff currently is real in that we're talking about milliseconds to calculate the checksum vs minutes to run without a cache. Obviously that's specific to my personal case, but I think it's still a valid point. A command line flag sounds good to me, that's something that can be set up easily in CI without having to affect local development. Also good with @marcandre's suggestions! Anything that simplifies the check here works well with me. I'm not 100% sure if I understand how to implement what you're suggesting (particularly how to determine what to check on |
I want to second @dvandersluis's point that the cache build performance hit matters less than the very real performance hit of running Rubocop in CI without a cache. |
Here's what I'd do as an easy way to shave a bit of time: # lib/rubocop.rb
require 'English'
before_us = $LOADED_FEATURES.dup
#... rest of the gazillion requires
if we_are_a_gem
RuboCop::ResultCache.rubocop_required_features = $LOADED_FEATURES - before_us
end
# lib/rubocop/result_cache.rb
# add a `rubocop_required_features` singleton class read write attribute, `@api private`, initialized with `[]`
# ...
source_files = $LOADED_FEATURES + Find.find(exe_root) - ResultCache.rubocop_required_features
digest << RuboCop::Version::STRING << RuboCop::AST::Version::STRING That should save going through more than 900 files... Not sure what's the best Let me know if you'd prefer me to do it. |
BTW, on my "memory-choked-and-not-getting-any-younger" 2013 macbook air, the crc32 takes ~0.06 seconds (including RuboCop's own source) |
@marcandre yeah if you have an idea of how to fix this, go for it! I was playing around with |
Yeah, my initial idea was overly complex. Let me whip up a PR then. |
#8643 should be merged soon. I think this present PR should be merged (although a Changelog entry would be nice). |
83f6136
to
55be764
Compare
@marcandre great! Updated with a changelog entry. |
…for each file rather than using mtime. File.mtime is faster, but inconsistent in CI, because in each CI build the mtime changes, which means that the rubocop cache implicitly cannot be used as-is. It's obviously slower to calculate a hash for each file, but this is still more performant (both in memory consumption and iterations per second) than the original.
55be764
to
a4b39be
Compare
@bbatsov still on the fence? |
Guess not. 😄 Might be a good idea to add some comment around the code about the CI impact, so someone wouldn't optimize it again in the future. :-) |
Sure, I added a small comment 👍 |
Thanks! |
Thanks guys! 🙌 |
Fixes #8629.
File.mtime
is faster, but inconsistent in CI, because in each CI build the mtime changes, which means that the rubocop cache implicitly cannot be used as-is. It's obviously slower to calculate a hash for each file, but this is still more performant (both in memory consumption and iterations per second) than the original.I wrote a benchmark to test memory and IPS: https://gist.github.com/dvandersluis/0ddbc936fc858fbbec9cf8fb796ec9a2
Obviously just accessing a file stat is going to be best, but that doesn't help when the results are inconsistent. I also tested some other options, such as other non-cryptographic hashes that are supposed to be fast (like mumble hash) and got worse results than using crc32 (plus those involve adding a gem which I think is better to avoid).
If this is too slow still, a thought I had would be to switch how to calculate the checksum based on
ENV['CI']
. Please let me know if you'd like me to make that change, or if there's another strategy to try.Before submitting the PR make sure the following are checked:
[Fix #issue-number]
(if the related issue exists).master
(if not - rebase it).bundle exec rake default
. It executes all tests and RuboCop for itself, and generates the documentation.