Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent explicit engine/lib paths from being excluded from app tree #1679

Closed
dgholz opened this issue Feb 9, 2022 · 5 comments · Fixed by #1699
Closed

Prevent explicit engine/lib paths from being excluded from app tree #1679

dgholz opened this issue Feb 9, 2022 · 5 comments · Fixed by #1699

Comments

@dgholz
Copy link

dgholz commented Feb 9, 2022

Related problem description
I was coming up with a workaround for #1677 & hit on adding the path to the gem as an engine path. Happily, it worked great locally. Unhappily, it failed on CI. We're using https://github.com/ruby/setup-ruby, and it configures Bundler to install gems into vendor/cache. Brakeman skips over vendor (good) but also ignores any additional engine or lib paths that match vendor (oh no).

Description of the solution you'd like
When I specify a lib or engine path in config or a command-line argument, have it be scanned even if it matches one of the reject patterns in Brakeman::AppTree.

Alternatives you've considered

  1. running with --no-skip-vendor
    Pretty hopeless, some of the gems we're using are tricky for Brakeman to parse & I'd prefer to work on our own code instead of making PRs to third-party projects
  2. moving the gem contents to an engine
    This is what we had previously, we're trying to separate the code so it's easier to reuse in other projects
  3. copy the code into another directory just before running Brakeman
    The current workaround, extremely fiddly to explain to other devs how to do it before they run Brakeman locally

Additional context
I'm discussing how to get ruby/setup-ruby to not put gems in vendor in ruby/setup-ruby#291

@presidentbeef
Copy link
Owner

When I specify a lib or engine path in config or a command-line argument, have it be scanned even if it matches one of the reject patterns in Brakeman::AppTree.

Agree.

@jrafanie
Copy link
Contributor

jrafanie commented Mar 2, 2022

We also ran into this in ManageIQ/manageiq#21751 where our CI runs with bundler installing gems into vendor/bundle. We explicitly call Brakeman.run with :engine_paths => list_of_paths but this is ignored because the exclusion occurs after the inclusion. Note, using skip_vendor => false isn't an option as we still want to exclude gems not in our engine_paths and also to skip such things as test directories.

The PR above monkey patches brakeman temporarily to only exclude vendor paths that aren't in the requested @engine_paths. This fix is strategic because that's what we use but as the OP said, we probably need to do this for the libs path also.

index fcc927665..d1f0b3a23 100644
--- a/lib/brakeman/app_tree.rb
+++ b/lib/brakeman/app_tree.rb
@@ -205,7 +205,7 @@ module Brakeman
       paths.reject do |path|
         relative_path = path.relative

-        if @skip_vendor and relative_path.include? 'vendor/'
+        if @skip_vendor and relative_path.include? 'vendor/' and @engine_paths.none? { |p| path.absolute.include?(p) }
           true
         else
           EXCLUDED_PATHS.any? do |excluded|

The code above works in properly excluding the default global excludes while still allowing our requested engine paths to be scanned.

If desired, I can draft a PR and tests to extract the engine_paths and lib_paths into methods in_engine_path and in_lib_path so that the conditional could be a bit easier to read:

if @skip_vendor and relative_path.include? 'vendor/' and !in_engine_path(path) and !in_lib_path(path) }

Note, this change is quite inefficient because we need to traverse the engine_paths and lib paths list 2x times where x is the number originally included files to scan.

It feels like we could move these various lists: "default includes" and "excludes" and "explicit includes" into 3+ Dir.glob calls so you could do "default includes" - "excludes" + "explicit includes". I would think Dir.glob with Array math would be faster than all of loops. Note, there's also the the only_files option so that would have to be considered also.

I don't know if my suggested surgical fix expanded for lib and engine paths would be a good first step to resolving this bug. If you're not using lib or engine paths, the performance impact should be minimal.

@presidentbeef
Copy link
Owner

I'd be happy if someone wants to dive into this!

I did look through the current code and it's not an easy change. Essentially, the excluded paths are the last thing that happens. That makes it difficult to give priority to explicitly-allowed paths.

@jrafanie
Copy link
Contributor

jrafanie commented Mar 2, 2022

I'd be happy if someone wants to dive into this!

I did look through the current code and it's not an easy change. Essentially, the excluded paths are the last thing that happens. That makes it difficult to give priority to explicitly-allowed paths.

Thanks for responding.

Yeah, my fix above should be ok although it's not terribly efficient. If that's fine, I can try doing it for "lib" and "engine" paths. I'll try to open a PR with tests soon. We can discuss it there.

I do think glob matching of files: "includes" - "excludes" + "lib paths" + "engine paths" would be more efficient but like I mentioned above, there's other caveats like only_files and other details I'm not familiar with.

Thanks!

jrafanie added a commit to jrafanie/brakeman that referenced this issue Apr 6, 2022
Fixes presidentbeef#1679

Currently, we find files that match various patterns such as the engine or
lib paths and exclude them if they reside in a vendor directory.

Instead, we should honor explicitly-allowed paths, even if we're excluding
vendor and they reside in a vendor directory.

Note, there are several globally EXCLUDED_PATHS such as:
/generators, test/, log/ that are still excluded even if they're in opt-in
engine/lib paths. This still seems reasonable and a test is included to
demonstrate this behavior is unchanged.
@jrafanie
Copy link
Contributor

jrafanie commented Apr 6, 2022

Ok, I finally got back to looking at this. Here's the surgical fix: #1699

jrafanie added a commit to jrafanie/brakeman that referenced this issue Apr 6, 2022
Fixes presidentbeef#1679

Currently, we find files that match various patterns such as the engine or
lib paths and exclude them if they reside in a vendor directory.

Instead, we should honor explicitly-allowed paths, even if we're excluding
vendor and they reside in a vendor directory.

Note, there are several globally EXCLUDED_PATHS such as:
/generators, test/, log/ that are still excluded even if they're in opt-in
engine/lib paths. This still seems reasonable and a test is included to
demonstrate this behavior is unchanged.
Repository owner locked and limited conversation to collaborators May 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants