New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow cops to share context between files #7968
Comments
Easiest to implement (for RuboCop and you) would probably be class level callbacks that any Cop class can optionally define (e.g. An easier alternative to your |
@marcandre that's an interesting idea, I always get a little nervous amount monkey patching given the wild west ruby used to be back in the day 😄. Those class level callback are a cool idea I hadn't considered but as you pointed out that would probably be one of the easier solutions to implement. I know this is a weird feature request since its not technically Rubocop's issue since without rubocop-daemon it works as expected. The solution I posted to look for the last instance of the I was actually a bit surprised I couldn't find any other examples of Cops that look for validations across files, I sort of would have thought this use case would be more common. If there is interest beyond my use case I'd be happy to put together an implementation of the idea you proposed, if not I'm also happy to close this and let it be a point of reference for someone else trying to do something similar. |
This is something that would be incredibly useful for the Cookstyle project which uses the RuboCop engine. We're analyzing Chef cookbooks and we've been pretty successful at it with the isolated file design of RuboCop. There's a lot of things that would be really great to build out that need to take into consideration the entire cookbook and therefore need a way to check the ast of multiple files. I'm currently walking the directory tree to find our metadata file and then reparsing that, but it's pretty slow to parse that file over and over again. Having some sort of way to grab that file's parsed ast out of cache and use a matcher against it would be really amazing. |
I'm slaving on a refactoring of the @bbatsov how does instances of cops being reused for different files sound like? That would impact only Cops what explicitly sign in to API I'm working on |
@tas50 super interesting use case, I knew I couldn't have been the only one who had a use case that required looking across source files. I'm not sure what the most extensible solution would be here, the class level callbacks would likely only satisfy my use case and help with rubocop-daemon type environments. If there's anything I can do to help with planning/implementation of a proposed solution I'd be happy to contribute. |
I think I've mostly hit the wall with what I can do one file at a time so I'm thinking about what I can achieve with helpers and ast caching at this point. Give me some more time and I'm sure I'll have more opinions on how this could potentially look in a perfect world. I have 190 cops for Chef now that we ship to all our customers. It's been a ton of fun to write these. Great platform overall and I think I can bend it to achieve the next phase of what I need. |
My PR now includes modern Cops being persisted accross files. The callbacks (now named
It's not clear if this persistence is on for all modern cops (inheriting from |
Makes sense to me. A bit of historical context - originally we went for the share-nothing approach mostly because in Ruby you can't really make much meaningful cross-file analysis - you don't know what code is loaded in the file you're inspecting (unlike in a language like Python, Java or Clojure), you don't know in which order the source files were loaded, etc. That meant other than looking for code duplications, there wasn't much potential for cross-file analysis and so we focused on analyzing files in isolation, which as a bonus was simpler to implement. |
Given that the majority of cops (ours or custom) don't need to keep context and that it is difficult to test that a cop is clearing it's cache (if any), I'm thinking it would be best that persistence be opt-in, say:
Otherwise cops get re-created as before. Sounds good? Note: our mixins should still support it and clear their cache though. As much as my |
Let me ask you a couple of questions @ajorgensen Do you namespace your engines? Are there any cops that would benefit from this feature? |
Do you namespace your engines? Do you use isolate_namespace? What happens if you extract an engine to a gem, how would you make sure the new engine from your monolith doesn't overlap with it? Are there any cops that would benefit from this feature? |
I could find more than that https://github.com/search?l=Ruby&q=isolate_namespace&type=Code # isolate_namespace is causing problems accessing the app's url helpers when the app
# mixes in methods to a controller.
# <%= logout_path %> <-- this will crash, no url helpers are accessible but I can't tell how fair is this though.
Let me reiterate this. What if you have
RuboCop is a static code analyzer.
That was the answer I was hoping to get, thanks. |
Seems to me that some cops looking for duplications could definitely benefit from it. Duplicated spec, duplicated constant, duplicated method, duplicated algorithm, ... |
A spec for the same class in a different folder?
|
@pirj Im not sure I understand your point. In the codebase that I am working on, and the motivation for this feature request, we have an explicit need for finding duplicate class names across many rails engines. Im sorry if I am misunderstanding your intention but I am not advocating for a general purpose duplicate class finder to be built into Rubocop, simply the ability to share context between cop instance during a run to allow any cross file type analysis. Rubocop today does not support comparing across different files in general as it has a "share nothing" model for the cops at runtime, which totally makes sense and makes the design much easier to reason about. For the custom cop we wrote for the codebase I work on we hacked around this limitation by using a class variable but it comes with its drawbacks specifically when attempting to cache cop runs to speed up execution, as I've mentioned this is more of a rubocop-daemon problem not Rubocop itself but I think it can really only be solved at the Rubocop level by providing a mechanism to share context between cop instances. |
Could be, or testing the same thing (say if there's a
Sorry, I wasn't clear, I meant duplicated values, say the same
Exactly, for flay-like processing... |
What I mean is that bending RuboCop to do cross-file checks may not overweigh the added complexity given that apart from this single use case there are other tools that do this job better (flay) or even no other dedicated tool dares to do that. Keeping in mind that Rails engines can be isolated out of the box if written correctly, I would say a cop in |
I'm not trying to shot down the idea, just not too convinced that it is really worth it. |
It's basically done, I just have to test it and close this issue, and it really wasn't complex. |
@marcandre were you still working on this? |
Totally slipped my mind, and I don't intend to, so I'm closing this. |
Is your feature request related to a problem? Please describe.
tl;dr It would be great to have a way to have a cop run across all processed sources to compare between files.
On a project I work on we make use of rails engines to separate different parts of our codebase into smaller components. As such we have a use case where we need to make sure there are no duplicate controller names in any of the rails engines so for example two separate rails engines should not be allowed to create the duplicate class name FooController.
Initially this was solved with a class level cache which would store all of the class names that had been seen and add an offense in there were any duplicates. This was a bit of a hack but since Rubocop creates a new cop instance per processed source this was the only way to share information between processed source files. When we added rubocop-daemon this became a problem because the cache now persists between runs due to the long running process. This leads to false positives because the cache will likely already contain all of the class names that we saw on the last run.
I know this isn't necessarily Rubocop's issue since its really the long running process via rubocop-daemon that creates the problem but I think that the issue can only really be solved within Rubocop itself.
Describe the solution you'd like
There are three solutions I can think of. The easiest might be to provide a run identifier that can uniquely identify each different rubocop run. This would allow the user to check this ID during cop execution and determine if they want to clear the cache wherever it is stored. This might be the easiest and is essentially how I solved it for our cop (i'll explain my current solution further down).
A second solution would be to have a
context
object that is created at the beginning of the run and allow cops to store and share context between each of the cops. In my case, this would allow my cop to store the different class names which can be checked in the next instance of the cop for the next processed file. This would probably be the most flexible and fit into the current model.Thirdly would be to change the current contract of a cop to allow it to be long lived. This would be to create all of the cop instances up front and then use those instance for each processed file. This would also likely require the addition of a few hooks that can be used by cops
before_run
,after_run
, etc to allow the cop to manually clean up any state that it has.Describe alternatives you've considered
I have currently worked around the issue essentially implementing idea #1. In the cop we still have the class level cache but we also keep track of the object_id of the Rubocop::Runner instance.
This instance is long lived for the life of the run and is recreated in rubocop-daemon for each run. This gives us a unique identifier that can be checked against for each processed file. If the object_id has changed then we know its a new run and can clear the cache.
Additional context
None
The text was updated successfully, but these errors were encountered: