feat(scanner): Write vulnerabilities and enrichments using iterators #11168

jvdm · 2024-05-18T02:27:25Z

Description

Adopt ClairCore's iterator interface to update vulnerabilities and enrichments. Create a fork of the JSON blob reader to allow individual record reading per operation in the JSON blob files (where vulnerabilities are stored in our bundles). Create our own "offline importer" method, Updater.Import(), which leverages the new JSON blob interface and the ClairCore datastore iterators.

The result is that the Matcher keeps one vulnerability in memory at a time, and the datastore code will batch vulnerabilities before sending them to the DB, effectively reducing memory consumption to a minimum during vulnerability updates.

Checklist

Investigated and inspected CI test results
Unit test and regression tests added
Evaluated and added CHANGELOG entry if required

Testing Performed

Here I tell how I validated my change

Deploy to openshift, ~~enable ROX_SCANNER_V4_MULTI_BUNDLE=true, change vulnerability URL to https://definitions.stackrox.io/v4/vulnerability-bundles/dev/vulnerabilities.zip~~, and verified successful vulnerability bundle download and writes to the DB.

Reminder for reviewers

In addition to reviewing code here, reviewers must also review testing and request further testing in case the
performed one does not seem sufficient. As a reviewer, you must not approve the change until you understand the
performed testing and you are satisfied with it.

rhacs-bot · 2024-05-18T03:00:11Z

Images are ready for the commit at 90bff64.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.4.x-835-g1c824c3b32.

codecov · 2024-05-22T02:35:21Z

Codecov Report

Attention: Patch coverage is 76.81159% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 48.00%. Comparing base (eb354f0) to head (dd6b5bc).

Files	Patch %	Lines
scanner/matcher/updater/vuln/updater.go	76.81%	12 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11168      +/-   ##
==========================================
+ Coverage   47.98%   48.00%   +0.01%     
==========================================
  Files        2330     2330              
  Lines      166761   166827      +66     
==========================================
+ Hits        80022    80077      +55     
- Misses      80386    80393       +7     
- Partials     6353     6357       +4

Flag	Coverage Δ
go-unit-tests	`48.00% <76.81%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RTann

A few initial notes. Honestly this iterator stuff is looking pretty confusing, and it may take me a while to fully understand what is going on and why this may work.

In the meantime, have you tried running this in a cluster yet? I'd love to see some metrics comparisons between the old way and this PR

CHANGELOG.md

scanner/matcher/updater/vuln/updater.go

scanner/updater/jsonblob/jsonblob.go

scanner/matcher/updater/vuln/updater_test.go

RTann

Can you add comments to each iter/yield to make it clear what it's job is? I'm following through everything, and I find myself jumping between three files (updater.go, jsonblob.go, and Claircore's updatevulnerabilities.go/enrichment.go), and it's hard to follow

jvdm · 2024-05-29T22:38:59Z

@ross:

In the meantime, have you tried running this in a cluster yet? I'd love to see some metrics comparisons between the old way and this PR

Yes, see testing instructions. You can see metrics comparison here: https://redhat-internal.slack.com/archives/C020W1C61G8/p1713481469821819

jvdm · 2024-05-29T23:23:34Z

Can you add comments to each iter/yield to make it clear what it's job is?

Added some. Clarify if you're expecting something else, please.

One way to better understand these iterators is by replacing the it() calls with an equivalent for .. := range ... statement. For example, for op, recordIt := range jsonblob.Iterate(...) {.

Then replace each yield() with a vulnChan <- vuln (a channel that is being consumed by the callee). The semantic is of a coroutine.

Hypothetical example (never tested):

func (u *Updater) Import(ctx context.Context, in io.Reader) (err error) {
	ctx = zlog.ContextWithValues(ctx, "component", "matcher/updater/Updater.Import")
	iter, iterErr := jsonBlobIterateFunc(in)
	// For each update operation in the JSON blob file.
	for op, it := range iter {
		var ops map[string][]driver.UpdateOperation
		ops, err = u.store.GetUpdateOperations(ctx, op.Kind, op.Updater)
		if err != nil {
			break
		}
		for _, o := range ops[op.Updater] {
			// This only helps if updaters don't keep something that
			// changes in the fingerprint.
			if o.Fingerprint == op.Fingerprint {
				zlog.Info(ctx).
					Str("updater", op.Updater).
					Msg("fingerprint match, skipping")
				continue
			}
		}
		var ref uuid.UUID
		count := 0
		switch op.Kind {
		case driver.VulnerabilityKind:
			ref, err = u.store.UpdateVulnerabilitiesIter(ctx, op.Updater, op.Fingerprint, func(yieldC chan *claircore.Vulnerability)) {
				// For each vulnerability in the update operation.
				for vuln, _, yieldC := it {
					count++
					// Offer one vulnerability to the datastore iterator.
					yieldC <- vuln
				})
				if err := it.Err(); err != nil {
					yieldC <- err  // example, right?
				}
			})
		case driver.EnrichmentKind:
			ref, err = u.store.UpdateEnrichmentsIter(ctx, op.Updater, op.Fingerprint, func(yieldC chan *driver.EnrichmentRecord)) {
				// For each enrichment in the update operation.
				for _, e, yieldC := it {
					count++
					// Offer one enrichment to the datastore iterator.
					yieldC <- e
				})
				if err := iterErr(); err != nil {
					yieldC <- err
				}
			})
		default:
			zlog.Warn(ctx).Str("kind", string(op.Kind)).Msg("unknown kind, skipping")
		}
		if err != nil {
			err = fmt.Errorf("updating %s: %w", op.Kind, err)
			return false
		}
		zlog.Info(ctx).
			Str("updater", op.Updater).
			Str("kind", string(op.Kind)).
			Str("ref", ref.String()).
			Int("count", count).
			Msg("update imported")
	})
	if err := iterErr(); err != nil {
		return fmt.Errorf("iterating on the reader: %w", err)
	}
	if err != nil {
		return err
	}
	return nil
}

RTann

A few more comments to enhance my understanding as well as some nits

CHANGELOG.md

scanner/matcher/updater/vuln/updater.go

scanner/updater/jsonblob/jsonblob.go

scanner/matcher/updater/vuln/updater.go

scanner/updater/jsonblob/jsonblob.go

openshift-ci · 2024-05-31T21:28:27Z

@jvdm: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ocp-4-candidate-scanner-v4-tests	`dd6b5bc`	link	false	`/test ocp-4-candidate-scanner-v4-tests`
ci/prow/ocp-4-candidate-operator-e2e-tests	`dd6b5bc`	link	false	`/test ocp-4-candidate-operator-e2e-tests`
ci/prow/ocp-4-candidate-qa-e2e-tests	`dd6b5bc`	link	false	`/test ocp-4-candidate-qa-e2e-tests`
ci/prow/gke-qa-e2e-tests	`9798cc8`	link	false	`/test gke-qa-e2e-tests`
ci/prow/gke-nongroovy-e2e-tests	`9798cc8`	link	false	`/test gke-nongroovy-e2e-tests`
ci/prow/gke-operator-e2e-tests	`9798cc8`	link	false	`/test gke-operator-e2e-tests`
ci/prow/gke-scanner-v4-tests	`9798cc8`	link	false	`/test gke-scanner-v4-tests`
ci/prow/ocp-4-15-scanner-v4-tests	`9798cc8`	link	false	`/test ocp-4-15-scanner-v4-tests`
ci/prow/ocp-4-15-operator-e2e-tests	`9798cc8`	link	false	`/test ocp-4-15-operator-e2e-tests`
ci/prow/ocp-4-15-qa-e2e-tests	`9798cc8`	link	false	`/test ocp-4-15-qa-e2e-tests`
ci/prow/ocp-4-12-operator-e2e-tests	`9798cc8`	link	false	`/test ocp-4-12-operator-e2e-tests`
ci/prow/ocp-4-12-qa-e2e-tests	`9798cc8`	link	false	`/test ocp-4-12-qa-e2e-tests`
ci/prow/ocp-4-12-scanner-v4-tests	`9798cc8`	link	false	`/test ocp-4-12-scanner-v4-tests`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Signed-off-by: J. Victor Martins <jvdm@sdf.org>

jvdm requested a review from a team as a code owner May 18, 2024 02:27

github-actions bot added the area/scanner label May 18, 2024

jvdm force-pushed the jvdm/multibundle/iter branch 5 times, most recently from db5c7f2 to 61c4029 Compare May 22, 2024 01:39

jvdm changed the title ~~feat(scanner): Write vulnerabilities and enrichments using streaming~~ feat(scanner): Write vulnerabilities and enrichments using iterators May 22, 2024

jvdm force-pushed the jvdm/multibundle/iter branch 4 times, most recently from cf85fba to 6cf35d7 Compare May 24, 2024 18:52

jvdm force-pushed the jvdm/multibundle/iter branch from 6cf35d7 to f895924 Compare May 29, 2024 19:07

RTann reviewed May 29, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

scanner/matcher/updater/vuln/updater.go Outdated Show resolved Hide resolved

scanner/matcher/updater/vuln/updater.go Show resolved Hide resolved

scanner/updater/jsonblob/jsonblob.go Show resolved Hide resolved

RTann requested review from BradLugo, daynewlee and dcaravel May 29, 2024 22:08

RTann reviewed May 29, 2024

View reviewed changes

scanner/matcher/updater/vuln/updater_test.go Outdated Show resolved Hide resolved

RTann reviewed May 29, 2024

View reviewed changes

jvdm requested a review from RTann May 29, 2024 22:52

jvdm force-pushed the jvdm/multibundle/iter branch from f895924 to 4ead053 Compare May 29, 2024 23:13

jvdm force-pushed the jvdm/multibundle/iter branch from 4ead053 to dd6b5bc Compare May 29, 2024 23:35

RTann reviewed May 31, 2024

View reviewed changes

jvdm force-pushed the jvdm/multibundle/iter branch from dd6b5bc to 9798cc8 Compare May 31, 2024 21:25

jvdm requested a review from RTann May 31, 2024 21:26

jvdm added 3 commits May 31, 2024 14:32

feat(scanner): Bump claircore to v1.5.28

9a829a0

Signed-off-by: J. Victor Martins <jvdm@sdf.org>

feat(scanner): Copy jsonblob w/ iter from ClairCore

93354ab

Signed-off-by: J. Victor Martins <jvdm@sdf.org>

feat(scanner): updater: Import using iterators

90bff64

Signed-off-by: J. Victor Martins <jvdm@sdf.org>

jvdm force-pushed the jvdm/multibundle/iter branch from 1c824c3 to 90bff64 Compare May 31, 2024 21:33

RTann approved these changes May 31, 2024

View reviewed changes

jvdm enabled auto-merge (squash) May 31, 2024 22:07

jvdm removed request for BradLugo, daynewlee and dcaravel May 31, 2024 22:24

jvdm merged commit c9fd6df into master May 31, 2024
59 of 69 checks passed

jvdm deleted the jvdm/multibundle/iter branch May 31, 2024 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scanner): Write vulnerabilities and enrichments using iterators #11168

feat(scanner): Write vulnerabilities and enrichments using iterators #11168

jvdm commented May 18, 2024 •

edited

rhacs-bot commented May 18, 2024 •

edited

codecov bot commented May 22, 2024 •

edited

RTann left a comment

RTann left a comment

jvdm commented May 29, 2024

jvdm commented May 29, 2024 •

edited

RTann left a comment

openshift-ci bot commented May 31, 2024 •

edited

feat(scanner): Write vulnerabilities and enrichments using iterators #11168

feat(scanner): Write vulnerabilities and enrichments using iterators #11168

Conversation

jvdm commented May 18, 2024 • edited

Description

Checklist

Testing Performed

Here I tell how I validated my change

Reminder for reviewers

rhacs-bot commented May 18, 2024 • edited

codecov bot commented May 22, 2024 • edited

Codecov Report

RTann left a comment

Choose a reason for hiding this comment

RTann left a comment

Choose a reason for hiding this comment

jvdm commented May 29, 2024

jvdm commented May 29, 2024 • edited

RTann left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 31, 2024 • edited

jvdm commented May 18, 2024 •

edited

rhacs-bot commented May 18, 2024 •

edited

codecov bot commented May 22, 2024 •

edited

jvdm commented May 29, 2024 •

edited

openshift-ci bot commented May 31, 2024 •

edited