New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract upstream package before matching #607
Conversation
const ( | ||
// this is the full set of data shapes that can be represented within the pkg.Package.Metadata field | ||
|
||
UnknownMetadataType MetadataType = "UnknownMetadata" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be just serialized during output based on the actual metadata data type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how this originally worked, yes. I decided to make a more firm division of responsibilities here between syft and grype since the data shapes represented by these constants are different depending which application being referred to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I don't see anything wrong here but a few general questions:
- Is there a reason to use
MetadataType
instead of relying on the Go types, like using:
switch metadata.(type) {
case JavaMetadata:
...
}
This tripped me up a bit and I had to debug a small amount to realize just setting the metadata was not enough. I could not really understand why a variable is used to denote a type when the type information is already available (and could be serialized/deserialized without issue based on the Go type, I suspect).
-
Would it be better to keep all the PURL parsing in the Syft decoding process? It seems wrong that we might generate a format and not be able to decode it in a meaningful manner but then handle it in Grype. That said, I don't really see a problem having some extra handling to just try to handle whatever we've been given. For Syft, SPDX, CycloneDX and any other supported formats, I would hope we're decoding these properly and won't need it, though.
-
Is the "upstream" abstraction shared across enough matchers that it should be top-level data? It seems as though it is specific to Apk, Dpkg, and Rpm matchers but not the others. Looking at the dataFromPkg function, it seems like it's really all just different types of metadata much like it was factored before. Might the upstreams list just replace the specific metadata types for those matchers instead of being a top-level thing? Or keep each of the metadata types but add a list of upstreams so the functions to deal with them can remain consistent for the aforementioned 3 matchers?
// version = "2.17.2" (or, if there's an epoch, we'd expect a value like "4:2.17.2") | ||
// release = "12.28.el6_9.2" | ||
// arch = "src" | ||
var rpmPackageNamePattern = regexp.MustCompile(`^(?P<name>.*)-(?P<version>.*)-(?P<release>.*)\.(?P<arch>[a-zA-Z][^.]+)(\.rpm)$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are a lot of type specific things added here; maybe this should predominantly be factored out to the subpackages under the matcher package or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are a lot of type specific things added here
Indeed, this was intentional. The observation was that by moving type specific handling as much as possible upstream in processing (in this case always during package creation) downstream processing becomes simplified.
The reason for the type-specific logic is to craft the UpstreamPackage
(which is agnostic to the package type) as early as possible. Today (before this PR) we push a lot of that processing into matcher sub-packages, which was causing lots of type assertions to get specific upstream package info out in order to do a fairly standard DB lookup. It became apparent that making the Upstream Package agnostic to the package type (information is inferred from the parent package) made it easier to do this logic when we are creating the Grype Package based off of the Syft package just after SBOM decoding or Syft cataloging (thereby making downstream processing in the matchers simpler).
I don't think I'd be able to put the rpm-specific logic for parsing into the RPM matcher package since that would introduce a package cycle. However, I could move this logic into another new package, but it isn't clear what that new package's purpose (thus name) would be other than something like rpm-utils
. For that reason I left this logic unexported in the pkg
package (closest to where it's used).
@kzantow :
absolutely, I'll make that update (good call)
It's true that this abstraction is not universal to all package types in grype, however, it is only these ecosystems (+nvd) where grype cares about looking for upstream packages. It may be one day that we add matching for additional upstream packages for other matchers, but we just haven't done so yet.
Yes, however, the upstreams processing is new and metadata processing for DPKG and APK was removed. This functionality was decomposed into a separate function was all.
That's exactly what this change aims to do. This PR already removes all of the package metadata's where possible (DPKG and APK metadata structs were flat out deleted). I could have gone further and removed the metadata for RPMdb, I can give that a shot and see what shakes out? I think this would mean a Syft update (your PR would be a good spot, I can add a commit?). I didn't look at the possibility of removing the Java metadata struct, but I feel that could be reserved for it's own PR. |
7b60580
to
e62f69a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I had a thought I'll bring up elsewhere about handling the metadata across things in a more uniform manner, I think this moves toward that goal 👍
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
8a73518
to
34df3b3
Compare
@kzantow I've made a couple of additions to the PR after the initial review, the diff for just these changes is here: https://github.com/anchore/grype/pull/607/files/816944a06810eb92ebc8121ee74b58f4b0bcf860..8dca42152af350d9fec8f2f5bf0800dffb7aa6d4 Additions:
|
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
34df3b3
to
8dca421
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -16,6 +16,8 @@ SUCCESS := $(BOLD)$(GREEN) | |||
# the quality gate lower threshold for unit test total % coverage (by function statements) | |||
COVERAGE_THRESHOLD := 47 | |||
BOOTSTRAP_CACHE="c7afb99ad" | |||
INTEGRATION_CACHE_BUSTER="894d8ca" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this cache busting for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported this functionality from syft --essentially gives us the ability to bust the CI cache ourselves instead of needing to change the underlying integration test fixtures fictitiously. In this case I wanted to ensure that the images used for integration tests was refreshed.
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
This PR makes the following adjustments:
attempts to extract upstream OS package information from pURLs when rich metadata is not available, which aids in package indirect matching for APK, RPM, and DPKG package types.based on PR review, this is being moved further back in processing into syft, probably in Improve SPDX decoding functionality syft#738.pkg.Package.Upstreams[]
.pkg.*Metadata
structs since they can be represented aspkg.UpstreamPackage
with no loss of information, and 2) simplifies the matching process for matchers that do matching by source-package indirection by sharing similar code (since they now share the sameUpstreamPackage
abstraction).pkg.MetadataType
in favor of a new Grypepkg.Metadata
type. This further enforces the bound context between Syft and Grype and in this case is better since theMetadataType
hints at the shape of thepkg.Package.Metadata
field, which no longer has any shared struct definitions with Syft.MetadataType
in the json outputFixes #395