Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CNVM] Trivy local cache file size increases indefinitely #2142

Closed
moukoublen opened this issue Apr 17, 2024 · 3 comments · Fixed by #2168
Closed

[CNVM] Trivy local cache file size increases indefinitely #2142

moukoublen opened this issue Apr 17, 2024 · 3 comments · Fixed by #2168
Assignees
Labels
8.15 candidate bug Something isn't working Team:Cloud Security Cloud Security team related Vulnerability Management

Comments

@moukoublen
Copy link
Member

moukoublen commented Apr 17, 2024

Describe the bug
Trivy uses a local file (bbolt db) as a cache in the /tmp directory (/tmp/trivy/fanal/fanal.db) that always increases in size with each cycle.

This results in the tmpfs file system holding the /tmp folder getting filled up (1), and Cloudbeat can no longer download the new trivy db (which it does on each cycle). This leads to Cloudbeat's crash loop and not providing cnvm findings. It could also have implications for other applications hosted in the same instance that could use /tmp for any crucial operation.

(2) /tmp (tmpfs) is a ram disk (placed in ram) with a maximum size, usually half of the host's total ram.

(Example screenshots of fanal.db size before and after some runs)
Screenshot 2024-04-16 at 11 01 33 AM
Screenshot 2024-04-16 at 11 38 12 AM
Screenshot 2024-04-17 at 9 03 19 AM

[ec2-user ~]$ sudo tree -h /tmp
/tmp
└── [   60]  trivy
    └── [   60]  fanal
        └── [ 1.9G]  fanal.db

12 directories, 1 file
[ec2-user ~]$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       587Mi       8.4Gi       1.9Gi       6.3Gi        12Gi
Swap:             0B          0B          0B

Preconditions
Any cnvm deployment.

To Reproduce

  1. Create a cnvm deployment with agent version >= 8.12 (pending checking older releases as well).
  2. Wait for many runs to pass (depending on the cloud assets, and host's ram size).

Expected behavior
Cloudbeat will be able to work indefinitely and produce events on each cycle.

Workaround till the fix
Restarting the host machine will delete everything from /tmp, and thus the fanal.db so Cloudbeat can continue to work and produce findings.

@moukoublen moukoublen added bug Something isn't working Team:Cloud Security Cloud Security team related labels Apr 17, 2024
@romulets
Copy link
Member

@orestisfl didn't you also create a ticket for this?

@orestisfl
Copy link
Contributor

@orestisfl didn't you also create a ticket for this?

https://github.com/elastic/security-team/issues/8217

@moukoublen
Copy link
Member Author

moukoublen commented Apr 22, 2024

I took a look into the ticket https://github.com/elastic/security-team/issues/8217

It seems the root cause is the same, but we just got a different error during the db update flow.

Trivy uses this to specify cache directory:
https://github.com/aquasecurity/trivy/blob/d4da83c633a46ad4a61844d8d5502d87b99465a0/pkg/utils/fsutils/fs.go#L23-L29

func defaultCacheDir() string {
	tmpDir, err := os.UserCacheDir()
	if err != nil {
		tmpDir = os.TempDir()
	}
	return filepath.Join(tmpDir, "trivy")
}

Which in most cases return a cache directory into filesystem (e.g. /root/.cache/trivy if is run as root)

Unless os.UserCacheDir() returns an error in which case it uses /tmp.

The function os.UserCacheDir() returns error (in Linux) when both XDG_CACHE_HOME and HOME env var are not defined:
https://cs.opensource.google/go/go/+/master:src/os/file.go;l=501-510?q=UserCacheDir&ss=go%2Fgo

default: // Unix
    dir = Getenv("XDG_CACHE_HOME")
    if dir == "" {
        dir = Getenv("HOME")
        if dir == "" {
            return "", errors.New("neither $XDG_CACHE_HOME nor $HOME are defined")
        }
        dir += "/.cache"
    }

Whic in our case there are not.

$ sudo cat /proc/$(pidof cloudbeat)/environ | tr '\0' '\n'
PWD=/opt/Elastic/Agent
SYSTEMD_EXEC_PID=2095
LANG=C.UTF-8
INVOCATION_ID=...
SHLVL=0
JOURNAL_STREAM=...
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
AGENT_COMPONENT_ID=cloudbeat/vuln_mgmt_aws-default
AGENT_COMPONENT_TYPE=cloudbeat/vuln_mgmt_aws

Cloudbeat that runs under elastic-agent does not inherit all environment variables.

So that explains both error logs we had.

Since the https://github.com/elastic/security-team/issues/8217 definition of done was to find the root cause of the issue (apart from solving it) and since the root cause was found during the investigation that led to this ticket, if there is no objection, I will close it as done, referring to this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.15 candidate bug Something isn't working Team:Cloud Security Cloud Security team related Vulnerability Management
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants