Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scanner): Write vulnerabilities and enrichments using iterators #11168

Merged
merged 3 commits into from
May 31, 2024

Conversation

jvdm
Copy link
Contributor

@jvdm jvdm commented May 18, 2024

Description

Adopt ClairCore's iterator interface to update vulnerabilities and enrichments. Create a fork of the JSON blob reader to allow individual record reading per operation in the JSON blob files (where vulnerabilities are stored in our bundles). Create our own "offline importer" method, Updater.Import(), which leverages the new JSON blob interface and the ClairCore datastore iterators.

The result is that the Matcher keeps one vulnerability in memory at a time, and the datastore code will batch vulnerabilities before sending them to the DB, effectively reducing memory consumption to a minimum during vulnerability updates.

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required

Testing Performed

Here I tell how I validated my change

Deploy to openshift, enable ROX_SCANNER_V4_MULTI_BUNDLE=true, change vulnerability URL to https://definitions.stackrox.io/v4/vulnerability-bundles/dev/vulnerabilities.zip, and verified successful vulnerability bundle download and writes to the DB.

Reminder for reviewers

In addition to reviewing code here, reviewers must also review testing and request further testing in case the
performed one does not seem sufficient. As a reviewer, you must not approve the change until you understand the
performed testing and you are satisfied with it.

@jvdm jvdm requested a review from a team as a code owner May 18, 2024 02:27
@rhacs-bot
Copy link
Contributor

rhacs-bot commented May 18, 2024

Images are ready for the commit at 90bff64.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.4.x-835-g1c824c3b32.

@jvdm jvdm force-pushed the jvdm/multibundle/iter branch 5 times, most recently from db5c7f2 to 61c4029 Compare May 22, 2024 01:39
Copy link

codecov bot commented May 22, 2024

Codecov Report

Attention: Patch coverage is 76.81159% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 48.00%. Comparing base (eb354f0) to head (dd6b5bc).

Files Patch % Lines
scanner/matcher/updater/vuln/updater.go 76.81% 12 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11168      +/-   ##
==========================================
+ Coverage   47.98%   48.00%   +0.01%     
==========================================
  Files        2330     2330              
  Lines      166761   166827      +66     
==========================================
+ Hits        80022    80077      +55     
- Misses      80386    80393       +7     
- Partials     6353     6357       +4     
Flag Coverage Δ
go-unit-tests 48.00% <76.81%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jvdm jvdm changed the title feat(scanner): Write vulnerabilities and enrichments using streaming feat(scanner): Write vulnerabilities and enrichments using iterators May 22, 2024
@jvdm jvdm force-pushed the jvdm/multibundle/iter branch 4 times, most recently from cf85fba to 6cf35d7 Compare May 24, 2024 18:52
Copy link
Contributor

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial notes. Honestly this iterator stuff is looking pretty confusing, and it may take me a while to fully understand what is going on and why this may work.

In the meantime, have you tried running this in a cluster yet? I'd love to see some metrics comparisons between the old way and this PR

CHANGELOG.md Outdated Show resolved Hide resolved
scanner/matcher/updater/vuln/updater.go Outdated Show resolved Hide resolved
scanner/matcher/updater/vuln/updater.go Show resolved Hide resolved
scanner/updater/jsonblob/jsonblob.go Show resolved Hide resolved
Copy link
Contributor

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments to each iter/yield to make it clear what it's job is? I'm following through everything, and I find myself jumping between three files (updater.go, jsonblob.go, and Claircore's updatevulnerabilities.go/enrichment.go), and it's hard to follow

@jvdm
Copy link
Contributor Author

jvdm commented May 29, 2024

@ross:

In the meantime, have you tried running this in a cluster yet? I'd love to see some metrics comparisons between the old way and this PR

Yes, see testing instructions. You can see metrics comparison here: https://redhat-internal.slack.com/archives/C020W1C61G8/p1713481469821819

@jvdm jvdm requested a review from RTann May 29, 2024 22:52
@jvdm
Copy link
Contributor Author

jvdm commented May 29, 2024

Can you add comments to each iter/yield to make it clear what it's job is?

Added some. Clarify if you're expecting something else, please.

One way to better understand these iterators is by replacing the it() calls with an equivalent for .. := range ... statement. For example, for op, recordIt := range jsonblob.Iterate(...) {.

Then replace each yield() with a vulnChan <- vuln (a channel that is being consumed by the callee). The semantic is of a coroutine.

Hypothetical example (never tested):

func (u *Updater) Import(ctx context.Context, in io.Reader) (err error) {
	ctx = zlog.ContextWithValues(ctx, "component", "matcher/updater/Updater.Import")
	iter, iterErr := jsonBlobIterateFunc(in)
	// For each update operation in the JSON blob file.
	for op, it := range iter {
		var ops map[string][]driver.UpdateOperation
		ops, err = u.store.GetUpdateOperations(ctx, op.Kind, op.Updater)
		if err != nil {
			break
		}
		for _, o := range ops[op.Updater] {
			// This only helps if updaters don't keep something that
			// changes in the fingerprint.
			if o.Fingerprint == op.Fingerprint {
				zlog.Info(ctx).
					Str("updater", op.Updater).
					Msg("fingerprint match, skipping")
				continue
			}
		}
		var ref uuid.UUID
		count := 0
		switch op.Kind {
		case driver.VulnerabilityKind:
			ref, err = u.store.UpdateVulnerabilitiesIter(ctx, op.Updater, op.Fingerprint, func(yieldC chan *claircore.Vulnerability)) {
				// For each vulnerability in the update operation.
				for vuln, _, yieldC := it {
					count++
					// Offer one vulnerability to the datastore iterator.
					yieldC <- vuln
				})
				if err := it.Err(); err != nil {
					yieldC <- err  // example, right?
				}
			})
		case driver.EnrichmentKind:
			ref, err = u.store.UpdateEnrichmentsIter(ctx, op.Updater, op.Fingerprint, func(yieldC chan *driver.EnrichmentRecord)) {
				// For each enrichment in the update operation.
				for _, e, yieldC := it {
					count++
					// Offer one enrichment to the datastore iterator.
					yieldC <- e
				})
				if err := iterErr(); err != nil {
					yieldC <- err
				}
			})
		default:
			zlog.Warn(ctx).Str("kind", string(op.Kind)).Msg("unknown kind, skipping")
		}
		if err != nil {
			err = fmt.Errorf("updating %s: %w", op.Kind, err)
			return false
		}
		zlog.Info(ctx).
			Str("updater", op.Updater).
			Str("kind", string(op.Kind)).
			Str("ref", ref.String()).
			Int("count", count).
			Msg("update imported")
	})
	if err := iterErr(); err != nil {
		return fmt.Errorf("iterating on the reader: %w", err)
	}
	if err != nil {
		return err
	}
	return nil
}

Copy link
Contributor

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments to enhance my understanding as well as some nits

CHANGELOG.md Show resolved Hide resolved
scanner/matcher/updater/vuln/updater.go Outdated Show resolved Hide resolved
scanner/matcher/updater/vuln/updater.go Show resolved Hide resolved
scanner/matcher/updater/vuln/updater.go Show resolved Hide resolved
scanner/updater/jsonblob/jsonblob.go Show resolved Hide resolved
scanner/updater/jsonblob/jsonblob.go Show resolved Hide resolved
scanner/matcher/updater/vuln/updater.go Show resolved Hide resolved
scanner/updater/jsonblob/jsonblob.go Show resolved Hide resolved
@jvdm jvdm requested a review from RTann May 31, 2024 21:26
Copy link

openshift-ci bot commented May 31, 2024

@jvdm: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocp-4-candidate-scanner-v4-tests dd6b5bc link false /test ocp-4-candidate-scanner-v4-tests
ci/prow/ocp-4-candidate-operator-e2e-tests dd6b5bc link false /test ocp-4-candidate-operator-e2e-tests
ci/prow/ocp-4-candidate-qa-e2e-tests dd6b5bc link false /test ocp-4-candidate-qa-e2e-tests
ci/prow/gke-qa-e2e-tests 9798cc8 link false /test gke-qa-e2e-tests
ci/prow/gke-nongroovy-e2e-tests 9798cc8 link false /test gke-nongroovy-e2e-tests
ci/prow/gke-operator-e2e-tests 9798cc8 link false /test gke-operator-e2e-tests
ci/prow/gke-scanner-v4-tests 9798cc8 link false /test gke-scanner-v4-tests
ci/prow/ocp-4-15-scanner-v4-tests 9798cc8 link false /test ocp-4-15-scanner-v4-tests
ci/prow/ocp-4-15-operator-e2e-tests 9798cc8 link false /test ocp-4-15-operator-e2e-tests
ci/prow/ocp-4-15-qa-e2e-tests 9798cc8 link false /test ocp-4-15-qa-e2e-tests
ci/prow/ocp-4-12-operator-e2e-tests 9798cc8 link false /test ocp-4-12-operator-e2e-tests
ci/prow/ocp-4-12-qa-e2e-tests 9798cc8 link false /test ocp-4-12-qa-e2e-tests
ci/prow/ocp-4-12-scanner-v4-tests 9798cc8 link false /test ocp-4-12-scanner-v4-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jvdm added 3 commits May 31, 2024 14:32
Signed-off-by: J. Victor Martins <jvdm@sdf.org>
Signed-off-by: J. Victor Martins <jvdm@sdf.org>
Signed-off-by: J. Victor Martins <jvdm@sdf.org>
@jvdm jvdm enabled auto-merge (squash) May 31, 2024 22:07
@jvdm jvdm merged commit c9fd6df into master May 31, 2024
59 of 69 checks passed
@jvdm jvdm deleted the jvdm/multibundle/iter branch May 31, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants