Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new format abstraction #543

Closed
wants to merge 17 commits into from
Closed

Add new format abstraction #543

wants to merge 17 commits into from

Conversation

wagoodman
Copy link
Contributor

@wagoodman wagoodman commented Oct 11, 2021

This is not ready for final review yet for a few reasons:

  • implementing multiple formats should be done with multiple PRs
  • there is an unnecessary breaking change with json encoding (remove scope) that can be deferred
  • needs a good rebase

Partially addresses anchore/grype#395

Today we have the presenter abstraction to write out internal SBOM results to a particular format (e.g. JSON, table, SPDX, CycloneDX). All model details about the format itself is contained under the presenter objects themselves. This abstraction was never meant to encapsulate handling intake (parsing) of these formats. As we add more formats it would be ideal to be able to parse these formats for use in other tools (such as syft). For this reason this PR adds a new format abstraction for specifying encoding/decoding details for a particular SBOM format (and the presenter can use format encoders to deal with presentation concerns).

The first commit of this PR adds the new format abstractions:

  • type Encoder ...: a function signature for encoding a given set of SBOM objects into bytes that are written to a writer.
  • type Decoder ...: a function signature for decoding an SBOM from a reader and returning SBOM objects.
  • type Validator ...: a function signature for observing the bytes of an SBOM document via a reader and returns any errors if the given document is not of a specific format.
  • type Format ...: ties together the above bits of functionality into a single object tied to a specific format.Option (with helper functions)

This new pattern allows for:

  • specifying not only encoding, but also decoding of a format
  • separates the concerns of presenting to a writer (behavior oriented) from specifying the SBOM format document shape (data oriented).
  • allows for "one way" and "two way" format conversions. That is, you can have a format that has both encoding and decoding supported (such as spdx-json), or a format that only allows for encoding (such as table [not implemented yet]).
  • enables encode operations without needing to use the presenter abstraction (which is meant specifically for deferred execution of encoding bound to a certain SBOM object)

This PR has moved a lot of code and large test fixtures. To better orient what is really happening I'd suggest reviewing a commit at a time in order... but these are the most important ones:

  • Take a look at the new format abstraction first: 9ded1c4
  • The next commits go over specific implementations that return the format struct: c1346ad, f6cc0c8
  • We expose this new encode/decode behavior in the main syft lib API: 12b44af

The remaining commits involve adding tests, moving existing test-fixtures, removing some presenter implementations, and making common SPDX helpers available to other packages.... essentially it's what "shakes out" after getting the abstraction and initial implementations added.

Why move both JSON and SPDX-JSON? I attempted to move just SPDX-JSON to implement anchore/grype#395, however, in doing so there was a lot of overlap with the JSON decoding implementation in grype. Instead of having this live in two places it was easier conceptually to go ahead an move the JSON decoding into the new approach.

Regarding details of the SPDX-JSON fields: new fields were added to the SPDX-JSON structs such that encoding-decoding loops are NOT lossy. That is, Before this PR if you were to encoding a SPDX-JSON document it would be lossy relative to the internal syft document... with the adjustments in this PR this is no longer the case. Why do this? In order to support keeping the package.Metadata fields that are sensitive to vulnerability scanning in grype. However, selecting ONLY the fields that grype needs is brittle from syft's point of view ("why were these fields persisted and those not?!?"). The initial implementation is lossless for this reason (and the added reason that selecting a different shape would require custom pkg.*Metadata struct definitions for each customization, which seems difficult to maintain!).

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
@wagoodman
Copy link
Contributor Author

Closing to keep the context here --a new PR will introduce the new pattern and one ported format

@wagoodman wagoodman closed this Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant