Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean trivy cache before each cycle #2168

Merged
merged 6 commits into from May 21, 2024
Merged

Conversation

moukoublen
Copy link
Member

@moukoublen moukoublen commented May 1, 2024

Summary of your changes

Clean trivy cache after each cycle

Trivy artifact.NewRunner calls initCache.

Depending on the provided configuration, the latter (initCache) could either return a runner or the error SkipScan (and nil runner).

In case one of the below flags is true, the artifact.NewRunner will perform the relevant clean/reset operation and return SkipScan.

  • CacheOptions.ClearCache
  • DBOptions.Reset
  • MisconfOptions.ResetPolicyBundle

This means that:

  1. The clean/reset operations are embedded inside initCache, called inside artifact.NewRunner. We cannot execute them standalonely.
  2. When at least one on those flags are true, the clean/reset operation takes place and returns SkipScan and nil runner, which means we cannot set the flags and get a valid runner with a single call (that's why I created a separate function ClearCache in my code changes).
  3. initCache could perform only one of each clean/reset operation and then return SkipScan, which means we cannot perform a single artifact.NewRunner call and init all of those flags to true and expect all those clear/reset operations to run. We need to do one at a time (that's why I created the loop inside ClearCache in my code changes).

Also Note: initCache with reset/clean flags could run only when no other bbolt client "holds" open the db. So it must be run before we create the artifact.NewRunner.

Screenshot/Data

With this PR the below happens at the beginning of the cycle:

Screenshot 2024-05-17 at 4 42 01 PM

Related Issues

Fixes: #2142

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have added the necessary README/documentation (if appropriate)

Introducing a new rule?

@mergify mergify bot added the backport-skip label May 1, 2024
@moukoublen moukoublen force-pushed the fix_cnvm_cache branch 2 times, most recently from 71782b7 to fccff44 Compare May 1, 2024 15:04
Copy link

github-actions bot commented May 1, 2024

📊 Allure Report - 💚 No failures were reported.

Result Count
🟥 Failed 0
🟩 Passed 359
⬜ Skipped 33

@elastic elastic deleted a comment from mergify bot May 1, 2024
@moukoublen moukoublen force-pushed the fix_cnvm_cache branch 2 times, most recently from c08ac8f to ff19585 Compare May 13, 2024 17:24
@moukoublen moukoublen changed the title Clean trivy cache after each cycle Clean trivy cache before each cycle May 14, 2024
@moukoublen moukoublen marked this pull request as ready for review May 14, 2024 14:00
@moukoublen moukoublen requested a review from a team as a code owner May 14, 2024 14:00
@moukoublen moukoublen force-pushed the fix_cnvm_cache branch 2 times, most recently from e960abf to da8bb81 Compare May 15, 2024 06:02
@@ -33,7 +34,13 @@ type VulnerabilityRunner struct {
}

func NewVulnerabilityRunner(log *logp.Logger) (VulnerabilityRunner, error) {
ctx := context.Background() // TODO: carry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo 😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, context.Background was used in this constructor function before these changes and thus the todo, because is out of the scope of this pr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a follow-up task instead of todo 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok removed the comment and I will open a task

log.Debug("NewVulnerabilityRunner: New")

if err := clearTrivyCache(ctx, log); err != nil {
log.Warnf("error during runner cache clearing %s", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a warning and not an error? I mean, this can lead to out of memory, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily; it depends on the error. Since it's not critical to cycle workflow, I used the warning level.

Update: changed it to Errorf.

}
}

// it should never go here (NewRunner should always return artifact.SkipScan and nil runner), but just in case it goes, lets close the runner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If skip scan is always expected, I would invert the flow to highlight as main flow.

So

  if errors.Is(err, trivy.SkipScan) {
    continue
  }

  // handle error and close r here then 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was like that at first 6bd6349

But then I wanted to be 100% sure that if a runner is returned, we should close it.

Because there can be only one runner instance to hold the cache bbolt db open. So if for some reason, a runner instance is not closed, then no other instance can be initialized and it holds at the db open step for ever (happened to me during testing). So this is a really critical thing and we need to be 100% sure we will avoid it.

For the same reason I also wrote the Also Note in the description but I failed to describe that the opposite is also a for-ever-blocking situation (if something holds the db open, then the runner we use will never be initialized).

Copy link
Member

@romulets romulets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't we have any tests covering this?

@eyalkraft
Copy link
Contributor

@moukoublen would that change mean we download the entire vuln DB for every cycle?

@moukoublen
Copy link
Member Author

@eyalkraft I thought the DB is being downloaded on each cycle either way. But I will check and perhaps remove the db reset option.

@romulets I couldn't think of a way to test this, to be honest, other than the already existing test (constructor does not return an error). I couldn't think of a way to safely test that "ensure that the cache is cleared after init".

@moukoublen
Copy link
Member Author

moukoublen commented May 15, 2024

@eyalkraft

Apparently, one of the two databases being downloaded —the Java one— is cached for three days, so in that case, yes, we would lose that. The other one is downloaded on every run, no matter what.

example log

Repeater cycle triggered
Need to update DB
DB Repository: ghcr.io/aquasecurity/trivy-db
Downloading DB...

Java DB Repository: ghcr.io/aquasecurity/trivy-java-db:1
Downloading the Java DB...
The Java DB is cached for 3 days. If you want to update the database more frequently, the '--reset' flag clears the DB cache.

So I removed the {DBOptions: flag.DBOptions{Reset: true} and I will test again to verify that {CacheOptions: flag.CacheOptions{ClearCache: true}}, is enough (it should be).

@moukoublen moukoublen force-pushed the fix_cnvm_cache branch 3 times, most recently from e4a8fb5 to 958d698 Compare May 20, 2024 12:46
internal/vulnerability/runner.go Show resolved Hide resolved
@moukoublen moukoublen merged commit 3ea6792 into elastic:main May 21, 2024
24 checks passed
@moukoublen moukoublen deleted the fix_cnvm_cache branch May 21, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CNVM] Trivy local cache file size increases indefinitely
4 participants