Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement RFC 3553 to add SBOM support #13709

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

justahero
Copy link
Contributor

@justahero justahero commented Apr 5, 2024

What does this PR try to resolve?

This PR is an implementation of RFC 3553 to add support to generate pre-cursor SBOM files for compiled artifacts in Cargo.

How should we test and review this PR?

The RFC 3553 adds a new option to Cargo to emit SBOM pre-cursor files. A project can be configured either by the new Cargo config field sbom.

# .cargo/config.toml
[build]
sbom = true

or using the environment variable CARGO_BUILD_SBOM=true. The sbom option is an unstable feature and requires the -Zsbom flag to enable it.

Check out this branch & compile Cargo. Pick a Cargo project to test it on, then run:

CARGO_BUILD_SBOM=true <path/to/compiled/cargo>/target/debug/cargo build -Zsbom

All generated *.cargo-sbom.json files are located in the target folder alongside their artifacts. To list all generated files use:

find ./target -name "*.cargo-sbom.json"

then check their content. To see the current output format, see these examples.

What does the PR not solve?

The PR leaves a task(s) open that are either out of scope or should be done in a follow-up PRs.

Additional information

There are a few things that I would like to get feedback on, in particular the generated JSON format is not final. Currently it holds the information listed in the RFC 3553, but it could be further enriched with information only available during builds.

During the implementation a number of questions arose.

  • Is using the UnitGraph the right structure to determine all dependencies?
  • Is the location in the compile method to generate the SBOM files appropriate?
  • Is there important information missing from the generated output, some fields that should be omitted?
  • How best to check JSON output in the related tests in testsuite, are useful tests missing?
  • Are custom build script dependencies correctly resolved?

Thanks @arlosi, @RobJellinghaus and @lfrancke for initial guidance & feedback.

@rustbot
Copy link
Collaborator

rustbot commented Apr 5, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added A-build-execution Area: anything dealing with executing the compiler A-configuration Area: cargo config files and env vars A-unstable Area: nightly unstable support S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 5, 2024
@heisen-li
Copy link
Contributor

Much respect for your contribution.

From my kind reminders, it seems appropriate to modify the documentation of the corresponding sections, e.g. Configuration, Environment Variables.

@weihanglo
Copy link
Member

Thanks for the reminder, @heisen-li. Would love to see a doc update, though we should probably focus on the design discussion first, as the location of the configuration is not yet decided. (See rust-lang/rfcs#3553 (comment)).

@epage
Copy link
Contributor

epage commented Apr 9, 2024

One approach for the docs (if this is looking to be merged) is to put the env and config documentation fragments in the Unstable docs.

@justahero justahero force-pushed the rfc3553/cargo-sbom-support branch from 190682e to ae0881c Compare May 2, 2024 19:54
src/cargo/core/compiler/mod.rs Show resolved Hide resolved
src/cargo/core/compiler/build_runner/mod.rs Show resolved Hide resolved
.iter()
.filter(|o| matches!(o.flavor, FileFlavor::Normal | FileFlavor::Linkable))
.filter_map(|output_file| output_file.hardlink.as_ref())
.map(|link_dst| link_dst.with_extension("cargo-sbom.json"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to handle name collisions here? For example dylibs might collide with each their as their name have no -<hash> suffix.

See #6313 for some details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I'm not sure how to solve this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already a problem for debug info, e.g. .pdb files on Windows. They have to be attributable to the file they correspond to, so they have the same name and cause a name collision. Cargo currently prints a warning about it, but otherwise just lets the collision happen and one of the debug info files is lost. However, losing SBOM info is highly undesirable.

We are not constrained by having the exact same name for the SBOM (unlike for debug info where we don't control the way it is discoverd), so we could do something about this.

I think we should emit all SBOM files in case of a collision, with a predictable naming scheme such as adding a number to the extension - foo.sbom, then foo.1.sbom and so on. The SBOM file will then contain the name of the binary it corresponds to, so that it can be disambiguated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should emit all SBOM files in case of a collision, with a predictable naming scheme such as adding a number to the extension - foo.sbom, then foo.1.sbom and so on. The SBOM file will then contain the name of the binary it corresponds to, so that it can be disambiguated.

This is a good idea. As an experimental feature we could have this temporarily. And We need to make sure this must be resolved before stabilization at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we append the .cargo-sbom.json extension rather than change the extension? That way if the SBOM file has a name collision, then the binary would also have a name collision. The name collision problem still needs to be solved, but I don't think we should solve it only for the sbom file.

src/cargo/core/compiler/output_sbom.rs Outdated Show resolved Hide resolved
assert!(p.bin("foo").with_extension("cargo-sbom.json").is_file());
assert_eq!(
1,
p.glob(p.target_debug_dir().join("libfoo.cargo-sbom.json"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we might need to deal with different naming convention on different platform. (Windows specifically?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the glob call can be simplified in a way that it works for all platforms.

tests/testsuite/sbom.rs Outdated Show resolved Hide resolved
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note that I reviewed this as-is, didn't really think too much for the design itself. Thank you for working on this!

src/cargo/core/compiler/build_config.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/build_config.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/output_sbom.rs Outdated Show resolved Hide resolved
@bors
Copy link
Collaborator

bors commented May 3, 2024

☔ The latest upstream changes (presumably #13571) made this pull request unmergeable. Please resolve the merge conflicts.

@justahero justahero force-pushed the rfc3553/cargo-sbom-support branch 4 times, most recently from 1cfd71a to 376fe1e Compare May 6, 2024 13:42
@rustbot rustbot added the A-documenting-cargo-itself Area: Cargo's documentation label May 6, 2024
Similar to the generation of `depinfo` files, a function is called to
generated SBOM precursor file named `output_sbom`. It takes the
`BuildRunner` & the current `Unit`. The `sbom` flag can be specified as
a cargo build option, but it's currently not configured correctly. To
test the generation the flag is set to `true`.

This passes in the cargo build config `sbom`.
This ignores dependencies for custom build scripts. The output should be
similar to what `cargo tree` reports.
This is similar to what the `cargo metadata` command outputs.
This extracts the logic to get the list of SBOM output file paths into
its own function in `BuildRunner` for a given Unit.
* extract sbom config into helper function
@justahero justahero force-pushed the rfc3553/cargo-sbom-support branch 3 times, most recently from 512c7ff to 67332d6 Compare May 7, 2024 07:40
This expands the tests to reflect end-to-end tests by comparing the
generated JSON output files with expected strings.

* add test helper to compare actual & expected JSON content
* refactor setup of packages in test
@justahero justahero force-pushed the rfc3553/cargo-sbom-support branch from 67332d6 to 0aa10e9 Compare May 7, 2024 11:13
@justahero justahero marked this pull request as ready for review May 7, 2024 11:53
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I like the idea of having this PR to explore SBOM format. I'll post back issues we've found so far to the RFC. Thank you :)

.iter()
.filter(|o| matches!(o.flavor, FileFlavor::Normal | FileFlavor::Linkable))
.filter_map(|output_file| output_file.hardlink.as_ref())
.map(|link_dst| link_dst.with_extension("cargo-sbom.json"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should emit all SBOM files in case of a collision, with a predictable naming scheme such as adding a number to the extension - foo.sbom, then foo.1.sbom and so on. The SBOM file will then contain the name of the binary it corresponds to, so that it can be disambiguated.

This is a good idea. As an experimental feature we could have this temporarily. And We need to make sure this must be resolved before stabilization at least.

src/doc/src/reference/unstable.md Show resolved Hide resolved
.masquerade_as_nightly_cargo(&["sbom"])
.with_stderr(
"\
warning: ignoring 'sbom' config, pass `-Zsbom` to enable it\n\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
warning: ignoring 'sbom' config, pass `-Zsbom` to enable it\n\
[WARNING] ignoring 'sbom' config, pass `-Zsbom` to enable it\n\

src/cargo/core/compiler/output_sbom.rs Show resolved Hide resolved
fn from(target: &Target) -> Self {
SbomTarget {
kind: target.kind().clone(),
crate_type: target.kind().rustc_crate_types().first().cloned(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the fact that we might have multiple crate-types in one unit, this seems to lose that information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought of the multiple crate-types when I wrote the RFC. It seems we probably need to make this field an array in the JSON so we're not losing this.

dependencies: Vec<SbomDependency>,
build_type: SbomBuildType,
) -> Self {
let package_id = dep.unit.pkg.package_id();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the top level package. Package ID Spec is the only stable representation in Cargo to describe a package version coordinate.

Suggested change
let package_id = dep.unit.pkg.package_id();
let package_id = dep.unit.pkg.package_id().to_spec();

}

#[derive(Serialize)]
struct Sbom {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a dependencies field for this top-level Sbom?

(Just a question. I don't really know if other SBOM formats need it to recover the dependency graph)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, yes. Copying my comment from the RFC:

Note that there are two ways of looking at dependencies: what each package needs, and the final resolved graph.

For example, if one package depends on rand with features = ["std", "getrandom"], and another with features = ["std", "simd_support"], the final resolved features will be ["std", "getrandom", "simd_support"]. Depending on the use case you may need either or both representations (direct package dependencies and the resolved graph).

cargo metadata exposes both (under "packages" and "resolve" fields), but inaccurately:

I think it would be best for the SBOM to also expose both, accurately this time.

So what I would like to see is two resolved dependency trees: one for normal dependencies and one for build dependencies, matching the way feature resolver v2 works.

version: String,
source: String,
target: SbomTarget,
profile: Profile,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad news. People can override profiles for individual packages. So, only a top-level profile might not be enough to represent the build.

(The truth is, I am not an SBOM expert, so just provide information for you to consider)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. This makes it seem like the profile needs to be captured for each package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it sufficient to only include a package's profile when it differs from the root level one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have some extra tests

  • A package with multiple crate-types.
  • A package that artifact name conflicts.

.file("src/lib.rs", "pub fn bar() -> i32 { 2 }")
.file(
"build.rs",
r#"fn main() { println!("cargo::rustc-cfg=foo"); }"#,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to add one more build script instruction cargo::rustc-check-cfg=cfg(has_foo) to avoid unexpected_cfg lint error.

See https://blog.rust-lang.org/2024/05/06/check-cfg.html.

.iter()
.filter(|o| matches!(o.flavor, FileFlavor::Normal | FileFlavor::Linkable))
.filter_map(|output_file| output_file.hardlink.as_ref())
.map(|link_dst| link_dst.with_extension("cargo-sbom.json"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we append the .cargo-sbom.json extension rather than change the extension? That way if the SBOM file has a name collision, then the binary would also have a name collision. The name collision problem still needs to be solved, but I don't think we should solve it only for the sbom file.

src/cargo/core/compiler/build_config.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/output_sbom.rs Show resolved Hide resolved
fn from(target: &Target) -> Self {
SbomTarget {
kind: target.kind().clone(),
crate_type: target.kind().rustc_crate_types().first().cloned(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought of the multiple crate-types when I wrote the RFC. It seems we probably need to make this field an array in the JSON so we're not losing this.

}

#[derive(Serialize, Clone)]
struct SbomRustc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect it would be the same as when running rustc -vV with a local build. What do we get here currently for a locally built rustc?

version: String,
source: String,
target: SbomTarget,
profile: Profile,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. This makes it seem like the profile needs to be captured for each package.

src/cargo/core/compiler/output_sbom.rs Outdated Show resolved Hide resolved
Instead of replacing the file extension, the `.cargo-sbom.json` suffix
is appended to the output file. This is to keep existing file extensions
in place.

* refactor logic to set `sbom` property from build config
* expand build script related test to check JSON output
* use `PackageIdSpec` instead of only `PackageId` in SBOM output
* change `version` of a dependency to `Option<Version>`
* output `Vec<CrateType>` instead of only the first found crate type
* output rustc workspace wrapper
* update 'warning' string in test using `[WARNING]`
* use `serde_json::to_writer` to serialize SBOM
* set sbom suffix in tests explicitely, instead of using `with_extension`
In case a unit's profile differs from the profile information on root
level, it's added to the package information to the JSON output.

The verbose output for `rustc -vV` is also written to the `rustc` field
in the SBOM.

* rename `fetch_packages` to `collect_packages`
* update JSON in tests to include profile information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-execution Area: anything dealing with executing the compiler A-configuration Area: cargo config files and env vars A-documenting-cargo-itself Area: Cargo's documentation A-unstable Area: nightly unstable support S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants