Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an 'audit' goal which checks lockfiles against reported vulnerabilities. #20838

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

TansyArron
Copy link
Contributor

This fetches vulnerability data from the pypi json api and prints a vulnerability report for each lockfile in the repo.

Current results from the Pants repo:

Screenshot 2024-04-23 at 6 34 14 PM

I've added a way to exclude specific vulnerabilities from the report, as sometimes they're not relevant. eg: GHSA-w596-4wvx-j9j6 only affects subversion projects.

@cburroughs
Copy link
Contributor

Neat! Some prior art/discussion in #16495 & #16288

Could you give some design context for going straight to the pypi json api instead of trying to wrap an existing tool like pip-audit?

@TansyArron
Copy link
Contributor Author

Neat! Some prior art/discussion in #16495 & #16288

Could you give some design context for going straight to the pypi json api instead of trying to wrap an existing tool like pip-audit?

Thanks for the links! Interesting to see someone elses build of this tool.

The pip-audit tool is itself a wrapper around the pypi/osv vulnerability api's, but it also does a bunch of extra work we don't need or want - eg: it performs a full resolve and potentially downloads more packages. It also goes poking through your local pip cache which doesn't play well with being sandboxed.

Because I wrote this tool to operate only over lockfiles we know we can skip all of that extra work. It works out smaller, lighter weight, and doesn't require a bunch of new 3rdparty deps.

I've tried to write it in such a way that we can add an osv backend to it if people prefer that over pypi.

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice stuff!

I think we may want to split the implementation into the core audit goal feature, and then have the Python part with pip_audit in the python backend. cf. how the publish goal is partitioned.

(some inconsistencies in naming as well, pip audit vs. pypi audit..)

request_type.field_set_type.create(target) # type: ignore[misc]
for target in targets
if (
request_type.tool_id in specified_ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tool filter ought to be applied outside of the request_type constructor, to avoid creating empty AuditRequests.`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is in a comprehension - if the if on line 102 doesn't pass, line 100 will not execute.

Edited to add: I double checked this with:

a = [1,2,3,4]

def foo(x):
    raise Exception(x)

evens = tuple(foo(x) for x in a if x%2 == 0)

This results in Exception: 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the if on line 102 doesn't pass, line 100 will not execute.

Yes, that is clear. But if the request types tool id is not in the specified_ids, we will not create any field sets, resulting in an empty AuditRequest. Moving the check outside the constructor will eliminate these no-op requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Good catch!

src/python/pants/backend/audit/pip_audit_rule_test.py Outdated Show resolved Hide resolved
Comment on lines +91 to +97
specified_ids = determine_specified_tool_ids(
"audit",
[
"pypi-audit",
],
request_types,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is temporary hack until a proper option is added...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, at present the only audit available is running against the pypi vulnerability database. In future we might want to use osv, or add ways to audit 3rdparty deps for languages other than python.

@TansyArron
Copy link
Contributor Author

Really nice stuff!

I think we may want to split the implementation into the core audit goal feature, and then have the Python part with pip_audit in the python backend. cf. how the publish goal is partitioned.

(some inconsistencies in naming as well, pip audit vs. pypi audit..)

Just to make sure I've got this straight:

  • The generic audit goal moves to src/python/pants/core/goals including the generic Result/Request/Subsystem definitions.
  • pip_audit_rule.py moves to src/python/pants/backend/python/goals

Do I leave anything in experimental? Where should the helper code (format_results.py, pip_audit.py) live?

The naming is a bit of a mess. If anyone's got preferences I'm all ears. pip-audit is a thing people are already familiar with as a tool. pypi-audit is slightly more accurate in that we're actually hitting the pypi api. python_3rdparty_dependency_vulnerability_audit is most accurate but crazy long. 🤷

@kaos
Copy link
Member

kaos commented Apr 26, 2024

Really nice stuff!
I think we may want to split the implementation into the core audit goal feature, and then have the Python part with pip_audit in the python backend. cf. how the publish goal is partitioned.
(some inconsistencies in naming as well, pip audit vs. pypi audit..)

Just to make sure I've got this straight:

  • The generic audit goal moves to src/python/pants/core/goals including the generic Result/Request/Subsystem definitions.

Ah, right, slightly misleading phrasing on my part. This doesn't have to be a core builtin goal, but stay in a backend (where it is), being the "core" (or perhaps rather, the "common" parts of the audit feature) in the pants.backend.audit backend. (The register.py file can stay in pants.backend.experimental.audit for the experimental status.)

  • pip_audit_rule.py moves to src/python/pants/backend/python/goals

Yes, and perhaps register these python specific audit rules from pants.backend.experimental.python?

Do I leave anything in experimental? Where should the helper code (format_results.py, pip_audit.py) live?

Keep audit related register.py files under experimental backends, and I think these util files for the python audit could live in src/python/pants/backend/python/audit/ (similar to other features that group related code for a feature is in that backend.)

The naming is a bit of a mess. If anyone's got preferences I'm all ears. pip-audit is a thing people are already familiar with as a tool. pypi-audit is slightly more accurate in that we're actually hitting the pypi api. python_3rdparty_dependency_vulnerability_audit is most accurate but crazy long. 🤷

I think pip-audit is misleading, as we're not using pip. The long name, although accurate, could be a bit too generic, in case alternatives turn up, they could all potentially fit that description, so I think my preference leans towards pypi-audit being the most precise.

@cburroughs
Copy link
Contributor

(Narrowly on the naming, if we are not using `pip-audit the program we should not call it "pip-audit". pypi-audit is reasonable.)



class AuditSubsystem(GoalSubsystem):
name = "audit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be experimental-audit for now like we did for deploy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We certainly could. I guess we could look at how big risk for change is there?
Deploy feels like it could be more sensitive to how it works, compared to presenting a list of applicable security reports, which could warrant this to go straight to a "stable" goal name.. for me it's fine either way, but I'm 👍🏽 for having the discussion to settle it.

@@ -240,3 +241,6 @@ args = ["-Yrangepos", "-Xlint:unused"]

[scala-infer]
force_add_siblings_as_dependencies = false

[pypi-audit]
lockfile_vulnerability_excludes = { "python-default" = ["GHSA-w596-4wvx-j9j6"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could you add a comment for why this can be ignored? It's not just an example, correct?
  • This is just me being unfamiliar with the space, but how do we end up with a GitHub advisory ID instead of a CVE or python specific one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think every exception should be made with a note as to why it's being excluded. Perhaps this should be generally encouraged, so we could have a syntax like:

Suggested change
lockfile_vulnerability_excludes = { "python-default" = ["GHSA-w596-4wvx-j9j6"] }
lockfile_vulnerability_excludes = { "python-default" = [{"id": "GHSA-w596-4wvx-j9j6", "note": "This is N/A for reasons a, b and c."}] }

and potentially warn/err if there is no note.

These excludes could then be briefly mentioned, along with the note, when running pants audit for visibility.

Co-authored-by: Andreas Stenius <git@astekk.se>
@huonw
Copy link
Contributor

huonw commented May 8, 2024

Thanks for the contribution.

When you have a chance (and this is close to ready), please merge main (or rebase onto it) and add some release notes to docs/notes/2.22.x.md (maybe in the Python section). See #20888 for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants