perf(misconf): High memory usage (9.5 GB) and long scan time (45 min) on some repos #6557

simar7 · 2024-04-24T23:15:52Z

Discussed in #6549 and #6517

^{Originally posted by ptupitsyn April 24, 2024}

Description

Some repos, like https://github.com/kubernetes/minikube, take a very long time to scan (45 minutes on t3.xlarge) and consume up to 9.5 GB of RAM.

Desired Behavior

Memory consumption below 1 GB, scan time under 5 minutes.

Actual Behavior

Memory consumption of 9.5 GB, scan time 45 minutes.

Reproduction Steps

1. git clone https://github.com/kubernetes/minikube.git
2. cd minikube
3. docker run -v $PWD:/myapp --entrypoint "trivy" aquasec/trivy --timeout 60m --quiet filesystem --scanners vuln,config --format json  /myapp

Target

Filesystem

Scanner

Vulnerability

Output Format

JSON

Mode

Standalone

Debug Output

No output.

Operating System

Ubuntu 22.04

Version

0.50.2

Checklist

Run trivy image --reset
Read the troubleshooting

The text was updated successfully, but these errors were encountered:

simar7 · 2024-04-25T01:36:59Z

Looks like the issue lies here

We seem to spend an awful lot of time getting the underlying metadata for the code snippets to show in the results. This involves reading each file that has a misconfiguration, which is expensive to do with large repos as there are many files.

Results

Input: https://github.com/kubernetes/minikube.git

Before

Doesn't finish in a reasonable time

After

./trivy --debug config ~/repos/trivy-issues/6557/minikube/

  194.62s user 5.54s system 152% cpu 2:11.61 total

Possible solutions

Maybe we should disable code snippets with such large repos (many files) as it is very expensive to read each file individually to know the source of misconfiguration.

To be clear, misconfigurations are still shown, just not the code snippets. They will look as follows:

Another idea could be to introduce a new flag where we disable the code snippets if the user wishes to do so. By default code snippets will be on (current behavior) but can be turned off, if the user decides to disable them for performance or by choice as seen here.

DmitriyLewen · 2024-04-25T04:51:53Z

I also have some thoughts:

I don't have much experience with trivy/iac, so I could be wrong.
Why do we need to read files again to get wrong code? Can we store cause code (like we do for secrets - we store previous and next line for secret as soon as we find it). My idea: scan file -> detect misconfiguration -> save wrong line in Result. @simar7 correct me if this is not possible.
This should save time, but we still need to double-check memory usage.
Do we need to read files if the check passes?

ptupitsyn · 2024-04-25T08:00:12Z

It would be great to be able to toggle code snippets with a CLI flag. Disable them when not needed to improve performance.

DmitriyLewen · 2024-04-25T08:13:24Z

@simar7 i agree with @ptupitsyn .
I would also choose a new flag to add more variety to the Trivy experience.

knqyf263 · 2024-04-25T12:05:33Z

I'm still wondering why it consumes 9.5 GB. If it reads each file individually, it doesn't use so much memory. Or does it keep all the file content in memory?

kaypee90 · 2024-04-27T18:19:59Z

It would be great to be able to toggle code snippets with a CLI flag. Disable them when not needed to improve performance.

I agree with @ptupitsyn's approach

simar7 added kind/bug Categorizes issue or PR as related to a bug. scan/misconfiguration Issues relating to misconfiguration scanning labels Apr 24, 2024

simar7 self-assigned this May 1, 2024

simar7 added this to the v0.52.0 milestone May 1, 2024

This was referenced May 1, 2024

feat(misconf): Add --disable-causes flag #6585

Draft

perf(misconf): Improve cause performance #6586

Merged

simar7 modified the milestones: v0.52.0, v0.51.0 May 3, 2024

simar7 closed this as completed in #6586 May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(misconf): High memory usage (9.5 GB) and long scan time (45 min) on some repos #6557

perf(misconf): High memory usage (9.5 GB) and long scan time (45 min) on some repos #6557

simar7 commented Apr 24, 2024 •

edited

Description

Desired Behavior

Actual Behavior

Reproduction Steps

Target

Scanner

Output Format

Mode

Debug Output

Operating System

Version

Checklist

simar7 commented Apr 25, 2024 •

edited

DmitriyLewen commented Apr 25, 2024

ptupitsyn commented Apr 25, 2024

DmitriyLewen commented Apr 25, 2024 •

edited

knqyf263 commented Apr 25, 2024

kaypee90 commented Apr 27, 2024

perf(misconf): High memory usage (9.5 GB) and long scan time (45 min) on some repos #6557

perf(misconf): High memory usage (9.5 GB) and long scan time (45 min) on some repos #6557

Comments

simar7 commented Apr 24, 2024 • edited

Discussed in #6549 and #6517

Description

Desired Behavior

Actual Behavior

Reproduction Steps

Target

Scanner

Output Format

Mode

Debug Output

Operating System

Version

Checklist

simar7 commented Apr 25, 2024 • edited

Results

Before

After

Possible solutions

DmitriyLewen commented Apr 25, 2024

ptupitsyn commented Apr 25, 2024

DmitriyLewen commented Apr 25, 2024 • edited

knqyf263 commented Apr 25, 2024

kaypee90 commented Apr 27, 2024

simar7 commented Apr 24, 2024 •

edited

simar7 commented Apr 25, 2024 •

edited

DmitriyLewen commented Apr 25, 2024 •

edited