how does the cache file works #13540
-
Hi there This is Nianjun from the Apache/ShardingSphere team. Recently, we added the checkstyle:check as required check in our GitHub action. This action is triggered by every single pull request and checks the Java code style for all modules. I noticed that running checkstyle:check takes almost 1 minute to check nearly 300 modules.this takes too much time for the reviewer to merge the code. I know that Checkstyle generates a cache file called 'checkstyle-cachefile' in the target folder. So, I tried caching all 'checkstyle-cachefile' files in the target folder by Interestingly, when I execute the command I tried downloading these 'checkstyle-cachefile' files from GitHub's cache and putting them in each module's target folder, but it still takes 1 minute to complete. I'm not sure if 'checkstyle-cachefile' has an expiration period or if it depends on some environment information or something else entirely. Could you please provide some tips or give me a brief description of how 'checkstyle-cachefile' works? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Basically the cache file is a plain text file. It contains the list of files processed by Checkstyle and their last updated timestamp, and also contains the configuration hash. The hash is generated based on the contents in the configuration. If the configuration hash is different on a new run, then the entire cache file is wiped and the run is the same as if a cache file never existed, reprocessing all files and modules. When I say configuration, this includes files connected to the XML configuration, but not in the same file, like suppression files, import control files, etc. Files processed and recorded into the cache file are assumed to be void of any violations. This means that the processed file does not need to be processed again by Checkstyle as long "nothing" changes. During a run, when a file has the same timestamp as in the cache, Checkstyle skips over the file from processing. It essentially assumes it was processed before and came out clean of any violation. Thus when a newer timestamp is found, Checkstyle will not skip over the file and reprocess the file checking for new violations. If new violations are found, the processed file will be removed from the cache. If no new violations are found, the processed file is added (or updated) to the cache with it's timestamp. If a new file is processed that wasn't before, it is the same as if the timestamp changed and the file is processed. If a file is never processed again and is listed in the cache file, it will remain in the cache file forever until it is either processed again or wiped (line or complete file). The cache file does not "expire". It either exists or it doesn't exist. You configure where the cache file is saved ( https://checkstyle.org/config.html#Checker_Properties ). If you store it in the target folder, it could be wiped when running the clean phase. I believe Maven runs have an issue with relative locations if you are using a multi-project POM. Be careful that you don't use the same cache file for different configurations, as they will each have different configuration hashes and each one will reset the cache on the subsequent run. Be sure to be careful that you run Checkstyle the same way, otherwise the cache file could be wiped erroneously. We had an issue in our repo where different types of runs (Maven versus Ant versus Command Line) would produce different configuration hashes even though it looked all the same. ( See #3566 ). Our complete CI takes almost (if not more) an hour to complete. 1 minute is very quick in comparison. |
Beta Was this translation helpful? Give feedback.
Basically the cache file is a plain text file. It contains the list of files processed by Checkstyle and their last updated timestamp, and also contains the configuration hash. The hash is generated based on the contents in the configuration.
If the configuration hash is different on a new run, then the entire cache file is wiped and the run is the same as if a cache file never existed, reprocessing all files and modules. When I say configuration, this includes files connected to the XML configuration, but not in the same file, like suppression files, import control files, etc. Files processed and recorded into the cache file are assumed to be void of any violations. This means that the p…