Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CycloneDX as SBOM input to grype #481

Closed
hectorj2f opened this issue Oct 26, 2021 · 11 comments · Fixed by #650
Closed

Support CycloneDX as SBOM input to grype #481

hectorj2f opened this issue Oct 26, 2021 · 11 comments · Fixed by #650
Assignees
Labels
enhancement New feature or request format:cyclonedx CycloneDX related enhancement or bug

Comments

@hectorj2f
Copy link

What would you like to be added:
We are using grype but our generated SBOM files are not always generated by Syft. We'd like to understand what is needed to accept this standard format.

Why is this needed:
To accept commonly used format for the generation of BOM.

Additional context:

We saw in the README that is part of the future plans. However the proliferation of tools creating SBOM files following the standard formats represents a blocker to continue using grype.

@hectorj2f hectorj2f added the enhancement New feature or request label Oct 26, 2021
@luhring
Copy link
Contributor

luhring commented Oct 26, 2021

Hi @hectorj2f — This makes sense, and it's something we definitely want to add soon.

There's a related effort to refactor how we ingest and output SBOM formats that we're working on currently, and we'll be able to leverage this work to extend our support of various formats. cc: @wagoodman 💪

Related issues:

@hectorj2f
Copy link
Author

@luhring @wagoodman We'd like to explore the possibilities here to understand what it needs to be done (a potential ETA) even if we need to help.
Otherwise we'd like to know if converting CycloneDX format into a CycloneDX Syft-friendly format would be a possibility here. Thanks in advance.

@wagoodman
Copy link
Contributor

@hectorj2f Let me at the least highlight the inputs to the vulnerability matching process and then we can figure how that will map into various SBOM formats (e.g. CycloneDX, SPDX, etc).

Today we require a few fields for each package:

  • package name (artifacts[].name)
  • package version (artifacts[].version)
  • package type (artifacts[].type)

The type is important as it is what is used to determine which matcher object to use ([1] [2]). Next up is the name and version --these are table stakes for any vulnerability matching process really.

Next there is more nuance needed to center which ecosystem the matcher should be searching within. For OS packages (RPMs, DEBs, and alpine APKs) we need to know the linux distribution name and version (e.g. rhel:8) --this narrows down the search to a subset of vulnerabilities from the whole database (and minimizes false positives). This is encapsulated in the Syft JSON output as the distro.name, distro.version, and distro.idLike fields (which correspond to /etc/os-release fields).

With all of the above fields in place you can get basic matching working. However, the results will be lossy if the below fields are not also included.

The OS packages also have optional information about upstream build dependencies for a package ("source" packages... [apk] artifacts[].metadata.originPackage, [dpkg] artifacts[].metadata.source , [rpm]) artifacts[].metadata.sourceRpm. This information is used to search for vulnerabilities that affect upstream packages that could affect the downstream package that is installed on the system.

Lastly there are some language-specific properties that are important. Today this is restricted to only some java fields... essentially artifacts[].metadata.artifactID and artifacts[].metadata.groupID.

Above are all of the fields that are used as input into the vulnerability matching process. An input SBOM that has a subset of these fields may not produce complete results.

As @luhring was mentioning, we're planning to add support for ingesting SPDX 2.2 XML/JSON and CycloneDX 1.2 XML documents with these format upgrades. We're in the middle of upgrading syft to include decoding capabilities for common SBOM formats that we can leverage in grype. Progress for implementing encoding can be tracked with anchore/syft#395 . I'm still working out details on decoding, or rather, expanding encoding and providing decoders that map as much information as is possible into each target format that is "kosher" to that format. I had a draft PR (that I closed) that implemented SPDX JSON encoding/decoding that was lossless relative to Syft JSON (used for grype input)... but it wasn't "kosher" relative to the SPDX JSON target format --encoding went "out of spec" and decoding leveraged these fields, which is not desirable.

This all is pretty high in priority right now, we're implementing changes now that enable more of this work in the coming weeks to get "unlocked". Hopefully this gets you going in the meantime! (also shout out if you have more questions on this, happy to answer!)

@cjnosal
Copy link
Contributor

cjnosal commented Oct 26, 2021

That aligns with my experience when we tried to generate our own syft json (from buildpack manifests):

  1. determining what syft type to use
  2. creating valid CPEs from partial information
  3. specifying a placeholder distro
  4. skipping the optional source fields

Cyclonedx providing the purl and/or CPE would alleviate 1 and 2 (I'm hoping the purl's scheme:/type would be mappable to syft type).

3 and 4 might be inferred from a few places in the document (meatadata, nested components, dependencies, compositions ...) but I suppose that's where the "kosher" considerations come into play, along with different BoM generators making different in-spec choices.

As CPE is deprecated in cyclonedx, I was also wondering if the store adapter would need a GetByPURL method / if that would be feasible?

@hectorj2f
Copy link
Author

hectorj2f commented Oct 27, 2021

Thanks @wagoodman, we really appreciate the details you shared with us. We are gonna look at them and try to come up with more questions. In the meantime, I think @xtreme-conor-nosal (one of our engineers) had a question, if you could help him 🙏🏻 :).

@xtreme-conor-nosal Based on my understanding and looking at @wagoodman draft PR https://github.com/anchore/syft/pull/578/files#diff-e7c9d669021517e45a99d7b97c892cd108728b70121931f94931a6f45882e2c6R29, I guess the answer would be affirmative.

@wagoodman
Copy link
Contributor

How could I have forgotten about providing CPEs on each package (artifacts[].cpes) for matching against NVD --thanks @xtreme-conor-nosal for that addition 👍

I think that GetByPURL on the store adapter is a great idea! There is a rough 1:1 match of package types to matcher ecosystems, so I don't think that will be an issue. I think the larger problem would be matching against NVD, which typically needs an accurate product and vendor to get relevant matches. That is --since NVD is indexed by CPEs we still need a good CPE as input to start the matching process, which hints at generating CPEs from a given pURL. This might be a good enhancement in syft that can be exported for use in grype by the new proposed GetByPURL method.

@wagoodman
Copy link
Contributor

wagoodman commented Dec 17, 2021

A compromise here when it comes to getting the feature in (enabling input from SBOMs in different formats) and the quality of the vulnerability matching... there is a good chance that matching will predictably be lower for other formats until we better understand how the information can be encoded in the other SBOM formats (SPDX and CycloneDX). In order to get some forward progress here, I think it makes sense to log warnings in cases where we know the matching could be lower. That is, we shouldn't require on-par matching quality to that of the Syft JSON input in order for this issue to be complete.

@hectorj2f
Copy link
Author

@wagoodman yes, it makes sense to me. I am happy to get involved on a WG to investigate what we need from these formats to get the same or better accuracy.

@wagoodman wagoodman added the format:cyclonedx CycloneDX related enhancement or bug label Dec 21, 2021
@samj1912
Copy link
Contributor

samj1912 commented Jan 6, 2022

Related - anchore/syft#710 (comment)

@kzantow
Copy link
Contributor

kzantow commented Jan 14, 2022

Notes from an offline conversation with @wagoodman

In order to support import of alternative SBOM formats, we will need a particular set of data for Grype to perform ideal matching: package name, package version, package vendor, OS distribution, and CPEs (I may have missed something here). However, it is likely that imported SBOMs will be missing some of this information and we'd still like for Grype to be as useful as possible. Some ideas for handling this are:

package name & package version: this is pretty much required and we will simply return an error if this isn't provided for anything

package vendor: being included somehow could contribute to CPE generation, if no CPEs are provided

CPEs if missing:

  • simply notify the user that no CPE matching is happening
  • automatically attempt to generate these based on the available package information
  • notify the user no CPE matching is happening but also provide a command line flag to automatically generate them

OS distribution:

  • notify the user no distro matching is happening
  • notify the user no distro matching happens and provide a flag to search across all distros (this may be very noisy)
  • allow the user to specify a distro hit via normal configuration
  • enhance PURLs to include enough information to reasonably determine the distro
  • accept a distribution hint in some metadata location as idiomatic to the input format as possible

In all of these cases, we should probably document how matching happens in the event of missing data and how a vendor could provide this information so Grype would have improved matching.

@cjnosal
Copy link
Contributor

cjnosal commented Jan 14, 2022

vendor should map to the author/publisher/supplier fields of a component (if provided)

For the distribution, is a version required, or is "distro family" a useful-enough hint? (e.g. if component purls are present and have a pkg:deb/debian prefix as syft currently outputs)

I like the approach of doing the best with what we have, and log warnings if appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request format:cyclonedx CycloneDX related enhancement or bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants