Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new format pattern + port json processing #550

Merged
merged 12 commits into from Oct 20, 2021
Merged

Conversation

wagoodman
Copy link
Contributor

Partially addresses anchore/grype#395

Today we have the presenter abstraction to write out internal SBOM results to a particular format (e.g. JSON, table, SPDX, CycloneDX). All model details about the format itself is contained under the presenter objects themselves. This abstraction was never meant to encapsulate handling intake (parsing) of these formats. As we add more formats it would be ideal to be able to parse these formats for use in other tools (such as syft). For this reason this PR adds a new format abstraction for specifying encoding/decoding details for a particular SBOM format (and the presenter can use format encoders to deal with presentation concerns).

The first commit of this PR adds the new format abstractions:

  • type Encoder ...: a function signature for encoding a given set of SBOM objects into bytes that are written to a writer.
  • type Decoder ...: a function signature for decoding an SBOM from a reader and returning SBOM objects.
  • type Validator ...: a function signature for observing the bytes of an SBOM document via a reader and returns any errors if the given document is not of a specific format.
  • type Format ...: ties together the above bits of functionality into a single object tied to a specific format.Option (with helper functions)

This new pattern allows for:

  • specifying not only encoding, but also decoding of a format
  • separates the concerns of presenting to a writer (behavior oriented) from specifying the SBOM format document shape (data oriented).
  • allows for "one way" and "two way" format conversions. That is, you can have a format that has both encoding and decoding supported (such as spdx-json), or a format that only allows for encoding (such as table [not implemented yet]).
  • enables encode operations without needing to use the presenter abstraction (which is meant specifically for deferred execution of encoding bound to a certain SBOM object)

This PR ports encoding from the syft presenters and decoding from the grype pkg.Provider implementation for the syftjson format.

Additionally the encode/decode behavior has been exposed in the syft top-level API , bypassing the presenter abstraction for the typical user (where deferred encoding is not necessary).

@wagoodman wagoodman requested a review from a team October 14, 2021 15:55
@wagoodman wagoodman self-assigned this Oct 14, 2021
@github-actions
Copy link

github-actions bot commented Oct 14, 2021

Benchmark Test Results

Benchmark results from the latest changes vs base branch
name                                                   old time/op    new time/op    delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2          1.11ms ± 2%    1.06ms ± 5%  -3.81%  (p=0.032 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2        1.86ms ± 2%    1.83ms ± 7%    ~     (p=0.222 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     535µs ± 4%     510µs ± 2%  -4.56%  (p=0.016 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 533µs ± 1%     520µs ± 2%  -2.52%  (p=0.016 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  542µs ± 3%     529µs ± 2%    ~     (p=0.151 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  11.2ms ± 4%    10.8ms ± 2%  -3.66%  (p=0.032 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  849µs ± 2%     818µs ± 5%    ~     (p=0.095 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2       739ns ± 2%     747ns ± 2%    ~     (p=0.421 n=5+5)

name                                                   old alloc/op   new alloc/op   delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           147kB ± 0%     146kB ± 0%    ~     (p=0.056 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2         755kB ± 0%     755kB ± 0%    ~     (p=0.841 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     119kB ± 0%     119kB ± 0%    ~     (p=0.151 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 133kB ± 0%     133kB ± 0%    ~     (p=0.333 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  140kB ± 0%     140kB ± 0%  -0.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  2.74MB ± 0%    2.75MB ± 0%    ~     (p=0.095 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                 1.18MB ± 0%    1.18MB ± 0%  -0.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2        336B ± 0%      336B ± 0%    ~     (all equal)

name                                                   old allocs/op  new allocs/op  delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           2.41k ± 0%     2.41k ± 0%    ~     (all equal)
ImagePackageCatalogers/python-package-cataloger-2         9.58k ± 0%     9.58k ± 0%    ~     (p=0.627 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     1.99k ± 0%     1.99k ± 0%    ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                 2.54k ± 0%     2.54k ± 0%    ~     (all equal)
ImagePackageCatalogers/rpmdb-cataloger-2                  3.25k ± 0%     3.25k ± 0%    ~     (all equal)
ImagePackageCatalogers/java-cataloger-2                   37.5k ± 0%     37.5k ± 0%    ~     (p=1.000 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  2.49k ± 0%     2.49k ± 0%    ~     (p=0.444 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2        9.00 ± 0%      9.00 ± 0%    ~     (all equal)

@wagoodman wagoodman added this to the Syft 1.0 milestone Oct 16, 2021
return f.decoder(reader)
}

func (f Format) Detect(b []byte) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @luhring : could this be Validate ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and if so, probably adjust the signature to be consistent with our new Validator concept (e.g. return an error))

}

if err := f.validator(bytes.NewReader(b)); err != nil {
return false
Copy link
Contributor Author

@wagoodman wagoodman Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @spiffcs : maybe log debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging here probably won't be helpful since we're expecting validation errors (which could get noisy)

Copy link
Contributor

@spiffcs spiffcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve based off changes highlighted in our zoom meeting.

The TODO left seem to be larger enough PR that they should not be included here.

)

// Encode takes all SBOM elements and a format option and encodes an SBOM document.
// TODO: encapsulate input data into common sbom document object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want me to keep track of the TODO still found in this PR and break them into separate smaller issues to solve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already broke this into #555 and #554

Copy link
Contributor

@luhring luhring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! (pending resolution of the other threads of thought already discussed)

Excellent work!! 👏


import "io"

type Validator func(reader io.Reader) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For future implementers, it could be nice to add a doc comment to explain what's intended to be implemented in a Validator. E.g., // A Validator assess the input to determine blah blah blah ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed -- let me follow up with some useful doc strings

return err
}
s.Target = payload
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: blank line above this line for consistency within the switch

return fmt.Errorf("unable to decode: %w", err)
}

// note: we accept al schema versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: al -> all


func Identify(by []byte) (*format.Format, error) {
for _, f := range All() {
if f.Detect(by) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Potentially changing this to create a new reader, based on a previous comment)

)

func TestIdentify(t *testing.T) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stray empty line

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* add new format pattern

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* add syftjson format

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* add internal formats helper

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* add SBOM encode/decode to lib API

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* remove json presenter + update presenter tests to use common utils

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* remove presenter format enum type + add formats shim in presenter helper

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* add MustCPE helper for tests

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* update usage of format enum

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* add test fixtures for encode/decode tests

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* fix integration test

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* migrate format detection to use reader

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* address review comments

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants