Add an 'audit' goal which checks lockfiles against reported vulnerabilities. #20838

TansyArron · 2024-04-23T22:39:49Z

This fetches vulnerability data from the pypi json api and prints a vulnerability report for each lockfile in the repo.

Current results from the Pants repo:

I've added a way to exclude specific vulnerabilities from the report, as sometimes they're not relevant. eg: GHSA-w596-4wvx-j9j6 only affects subversion projects.

cburroughs · 2024-04-24T01:49:32Z

Neat! Some prior art/discussion in #16495 & #16288

Could you give some design context for going straight to the pypi json api instead of trying to wrap an existing tool like pip-audit?

TansyArron · 2024-04-24T20:16:39Z

Neat! Some prior art/discussion in #16495 & #16288

Could you give some design context for going straight to the pypi json api instead of trying to wrap an existing tool like pip-audit?

Thanks for the links! Interesting to see someone elses build of this tool.

The pip-audit tool is itself a wrapper around the pypi/osv vulnerability api's, but it also does a bunch of extra work we don't need or want - eg: it performs a full resolve and potentially downloads more packages. It also goes poking through your local pip cache which doesn't play well with being sandboxed.

Because I wrote this tool to operate only over lockfiles we know we can skip all of that extra work. It works out smaller, lighter weight, and doesn't require a bunch of new 3rdparty deps.

I've tried to write it in such a way that we can add an osv backend to it if people prefer that over pypi.

kaos

Really nice stuff!

I think we may want to split the implementation into the core audit goal feature, and then have the Python part with pip_audit in the python backend. cf. how the publish goal is partitioned.

(some inconsistencies in naming as well, pip audit vs. pypi audit..)

kaos · 2024-04-25T06:25:52Z

src/python/pants/backend/audit/audit.py

+            request_type.field_set_type.create(target)  # type: ignore[misc]
+            for target in targets
+            if (
+                request_type.tool_id in specified_ids


I think the tool filter ought to be applied outside of the request_type constructor, to avoid creating empty AuditRequests.`

All of this is in a comprehension - if the if on line 102 doesn't pass, line 100 will not execute.

Edited to add: I double checked this with:

a = [1,2,3,4] def foo(x): raise Exception(x) evens = tuple(foo(x) for x in a if x%2 == 0)

This results in Exception: 2

if the if on line 102 doesn't pass, line 100 will not execute.

Yes, that is clear. But if the request types tool id is not in the specified_ids, we will not create any field sets, resulting in an empty AuditRequest. Moving the check outside the constructor will eliminate these no-op requests.

Ah, I see. Good catch!

src/python/pants/backend/audit/pip_audit_rule_test.py

kaos · 2024-04-25T06:50:18Z

src/python/pants/backend/audit/audit.py

+    specified_ids = determine_specified_tool_ids(
+        "audit",
+        [
+            "pypi-audit",
+        ],
+        request_types,
+    )


I'm assuming this is temporary hack until a proper option is added...

Yeah, at present the only audit available is running against the pypi vulnerability database. In future we might want to use osv, or add ways to audit 3rdparty deps for languages other than python.

TansyArron · 2024-04-25T19:35:12Z

Really nice stuff!

I think we may want to split the implementation into the core audit goal feature, and then have the Python part with pip_audit in the python backend. cf. how the publish goal is partitioned.

(some inconsistencies in naming as well, pip audit vs. pypi audit..)

Just to make sure I've got this straight:

The generic audit goal moves to src/python/pants/core/goals including the generic Result/Request/Subsystem definitions.
pip_audit_rule.py moves to src/python/pants/backend/python/goals

Do I leave anything in experimental? Where should the helper code (format_results.py, pip_audit.py) live?

The naming is a bit of a mess. If anyone's got preferences I'm all ears. pip-audit is a thing people are already familiar with as a tool. pypi-audit is slightly more accurate in that we're actually hitting the pypi api. python_3rdparty_dependency_vulnerability_audit is most accurate but crazy long. 🤷

src/python/pants/backend/audit/audit.py

kaos · 2024-04-26T07:26:10Z

Really nice stuff!
I think we may want to split the implementation into the core audit goal feature, and then have the Python part with pip_audit in the python backend. cf. how the publish goal is partitioned.
(some inconsistencies in naming as well, pip audit vs. pypi audit..)

Just to make sure I've got this straight:

The generic audit goal moves to src/python/pants/core/goals including the generic Result/Request/Subsystem definitions.

Ah, right, slightly misleading phrasing on my part. This doesn't have to be a core builtin goal, but stay in a backend (where it is), being the "core" (or perhaps rather, the "common" parts of the audit feature) in the pants.backend.audit backend. (The register.py file can stay in pants.backend.experimental.audit for the experimental status.)

pip_audit_rule.py moves to src/python/pants/backend/python/goals

Yes, and perhaps register these python specific audit rules from pants.backend.experimental.python?

Do I leave anything in experimental? Where should the helper code (format_results.py, pip_audit.py) live?

Keep audit related register.py files under experimental backends, and I think these util files for the python audit could live in src/python/pants/backend/python/audit/ (similar to other features that group related code for a feature is in that backend.)

The naming is a bit of a mess. If anyone's got preferences I'm all ears. pip-audit is a thing people are already familiar with as a tool. pypi-audit is slightly more accurate in that we're actually hitting the pypi api. python_3rdparty_dependency_vulnerability_audit is most accurate but crazy long. 🤷

I think pip-audit is misleading, as we're not using pip. The long name, although accurate, could be a bit too generic, in case alternatives turn up, they could all potentially fit that description, so I think my preference leans towards pypi-audit being the most precise.

cburroughs · 2024-04-26T14:04:27Z

(Narrowly on the naming, if we are not using `pip-audit the program we should not call it "pip-audit". pypi-audit is reasonable.)

cburroughs · 2024-04-26T14:04:57Z

src/python/pants/backend/audit/audit.py

+
+
+class AuditSubsystem(GoalSubsystem):
+    name = "audit"


Should this be experimental-audit for now like we did for deploy?

We certainly could. I guess we could look at how big risk for change is there?
Deploy feels like it could be more sensitive to how it works, compared to presenting a list of applicable security reports, which could warrant this to go straight to a "stable" goal name.. for me it's fine either way, but I'm 👍🏽 for having the discussion to settle it.

cburroughs · 2024-04-26T14:08:06Z

pants.toml

@@ -240,3 +241,6 @@ args = ["-Yrangepos", "-Xlint:unused"]

 [scala-infer]
 force_add_siblings_as_dependencies = false
+
+[pypi-audit]
+lockfile_vulnerability_excludes = { "python-default" = ["GHSA-w596-4wvx-j9j6"] }


Could you add a comment for why this can be ignored? It's not just an example, correct?

This is just me being unfamiliar with the space, but how do we end up with a GitHub advisory ID instead of a CVE or python specific one?

Agree. I think every exception should be made with a note as to why it's being excluded. Perhaps this should be generally encouraged, so we could have a syntax like:

Suggested change

lockfile_vulnerability_excludes = { "python-default" = ["GHSA-w596-4wvx-j9j6"] }

lockfile_vulnerability_excludes = { "python-default" = [{"id": "GHSA-w596-4wvx-j9j6", "note": "This is N/A for reasons a, b and c."}] }

and potentially warn/err if there is no note.

These excludes could then be briefly mentioned, along with the note, when running pants audit for visibility.

Co-authored-by: Andreas Stenius <git@astekk.se>

huonw · 2024-05-08T05:00:57Z

Thanks for the contribution.

When you have a chance (and this is close to ready), please merge main (or rebase onto it) and add some release notes to docs/notes/2.22.x.md (maybe in the Python section). See #20888 for more info.

Tansy Arron added 9 commits April 23, 2024 11:30

constraints strings handling.

cdaae0d

format results.

67dba0c

Improve formatting.

b674289

Formatting and general improvments.

cdd1545

Rebase related updates.

204d393

tests.

fbe9695

lint/fmt

9d3d55c

Lint Fixes: use MultiGet and don't use map.

f1a4aea

further cleanup.

bcfe521

TansyArron added the category:new feature label Apr 23, 2024

Tansy Arron added 2 commits April 24, 2024 13:24

fmt.

c322515

why does mypy hate me.

f528375

TansyArron force-pushed the tansy/audit branch from 09234b0 to f528375 Compare April 24, 2024 20:56

Move everything but register.py out of experimental.

6d72404

kaos reviewed Apr 25, 2024

View reviewed changes

Remove extraneous test data, mock requests.get.

d71efe1

kaos reviewed Apr 26, 2024

View reviewed changes

src/python/pants/backend/audit/audit.py Outdated Show resolved Hide resolved

cburroughs reviewed Apr 26, 2024

View reviewed changes

Update src/python/pants/backend/audit/audit.py

55cc6e5

Co-authored-by: Andreas Stenius <git@astekk.se>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an 'audit' goal which checks lockfiles against reported vulnerabilities. #20838

Add an 'audit' goal which checks lockfiles against reported vulnerabilities. #20838

TansyArron commented Apr 23, 2024

cburroughs commented Apr 24, 2024

TansyArron commented Apr 24, 2024

kaos left a comment •

edited

kaos Apr 25, 2024

TansyArron Apr 25, 2024 •

edited

kaos Apr 26, 2024

TansyArron May 1, 2024

kaos Apr 25, 2024

TansyArron Apr 25, 2024

TansyArron commented Apr 25, 2024

kaos commented Apr 26, 2024

cburroughs commented Apr 26, 2024

cburroughs Apr 26, 2024

kaos May 2, 2024

cburroughs Apr 26, 2024

kaos May 2, 2024

huonw commented May 8, 2024

	lockfile_vulnerability_excludes = { "python-default" = ["GHSA-w596-4wvx-j9j6"] }
	lockfile_vulnerability_excludes = { "python-default" = [{"id": "GHSA-w596-4wvx-j9j6", "note": "This is N/A for reasons a, b and c."}] }

Add an 'audit' goal which checks lockfiles against reported vulnerabilities. #20838

Are you sure you want to change the base?

Add an 'audit' goal which checks lockfiles against reported vulnerabilities. #20838

Conversation

TansyArron commented Apr 23, 2024

cburroughs commented Apr 24, 2024

TansyArron commented Apr 24, 2024

kaos left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TansyArron Apr 25, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TansyArron commented Apr 25, 2024

kaos commented Apr 26, 2024

cburroughs commented Apr 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huonw commented May 8, 2024

kaos left a comment •

edited

TansyArron Apr 25, 2024 •

edited