Feature: Add machine-readable remediation to the hasDangerousWorkflowScriptInjection probe #3950

pnacht · 2024-03-18T21:30:26Z

Is your feature request related to a problem? Please describe.
The finding.Finding.Remediation.Patch field is meant to store machine-readable patches to fix the finding:

scorecard/finding/probe/probe.go

Lines 45 to 47 in 6fc7d4c

    
           type Remediation struct { 
        
           	// Patch for machines. 
        
           	Patch *string `json:"patch,omitempty"`

Fixing script injection in workflows is pretty straightforward and can be solved procedurally, but the probe's findings don't include the patch.

Describe the solution you'd like
The findings generated by hasDangerousWorkflowScriptInjection should include a remediation patch that is machine-readable, and preferably human-recognizable. I'd propose the "unified diff" format used by git diff or diff -u.

The utopian solution would allow us to create a single "global" patch that maintainers could then apply to the project, entirely fixing all the dangerous script injections. However, my understanding of the probe architecture leads me to believe we can only fix individual findings, one at a time.

I have already created some first-draft code that can perform this operation. Running it, we get output such as:

{
  # ...
  "findings":
    [
      {
        # ...
        "remediation":
          {
            "patch": "--- a/.github/workflows/pull_request_open_polling.yml\n+++ b/.github/workflows/pull_request_open_polling.yml\n...",
            # ...
          },
      },
      {
        # ...
        "remediation":
          {
            "patch": "--- a/.github/workflows/pull_request_update_polling.yml\n+++ b/.github/workflows/pull_request_update_polling.yml\n...",
            # ...
          },
      },
    ],
}

Pretty-printing each of those patches, we get a git diff-like* "unified" output that can be used to fix the relevant finding:

--- a/.github/workflows/pull_request_open_polling.yml
+++ b/.github/workflows/pull_request_open_polling.yml
@@ -8,10 +8,13 @@
   pull_request:
     types: [opened, reopened]
 
+env:
+  PR_HEAD_REF: ${{ github.event.pull_request.head.ref }}
+
 jobs:
   build:
     runs-on: jenkins-polling
 
     steps:
       - name: Run PullRequestJenkinsPoller
-        run: ... ${{ github.event.pull_request.head.ref }} ...
+        run: ... $PR_HEAD_REF ...

I'll be happy to send this code as a PR once I've polished it up.

However, there are several issues with my current implementation that I wish to discuss before moving forward:

`git diff`-like output

The biggest problem is generating the git diff output. It'd be great if we could use a dependency to create this output given two versions of a file, but a reasonable dependency doesn't really exist, as far as I can tell:

go-git (already used by Scorecard) can generate slices describing all the changes between the files (https://github.com/go-git/go-git/blob/master/utils/diff/diff.go), but it can't create the output in the "unified" format we need for git apply. That functionality is basically just a wrapper for sergi/go-diff, which isn't currently used by Scorecard, so this would need to be added to Scorecard's go.mod.
hexops/gotextdiff and akedrou/textdiff are effectively identical packages that can create the necessary git diff output. However, these packages simply export an internal module used by Go tools (x/tools/internal/diff). However, given their "one-and-done" concept, they've been archived by their maintainers (and, unsurprisingly, have poor Scorecard scores: 2.9).

I've explored two alternatives:

Using hexops/gotextdiff works just fine to generate the "unified diff", as far as I can tell. But it really is a lousy dependency to add to Scorecard.
I've also written some code to generate the "unified diff" ourselves based on go-git's output. It works for the handful of tests I've run, but it'd probably have to be more thoroughly tested to make sure it covers edge cases, etc.

How to store expected unit test outputs

This code would naturally need to be tested. Each test case will be composed of a workflow that needs to be fixed and all the information we'd receive from the finding (location, unsafe variable, etc). However, the unit tests also need to know what the expected output of each test would be. We need to decide how we wish to store this information. As I see it, our options are:

we can store the expected "fixed workflow" (i.e. workflow-fixed.yml), and then generate the git diff-like output during the test and compare it to ours. Conceptually, this would be the best solution, but the only way I see to generate that output would be to... run git diff using exec.Cmd() or something, which has all the downsides of using exec.
In this particular case, I don't think it'd be toooooo problematic, though? Any machine running tests on Scorecard likely has git installed on the $PATH (and we can simply skip the test if it's not found), and the "unified format" has been stable since its release in 1990, so there isn't much risk of different results on machines with different versions of git.
store the expected git diff file (generated by running git diff on the broken workflow and a fixed version – which is untracked) and compare our output directly to it. This makes the test code simple, but means we're storing a human-legible-but-unfriendly format. It's also somewhat brittle: if we modify the original workflow (i.e. adding comments), that will change the git diff output as well (modified line positions, maybe even changing the diff itself), so the .diff file would also need to be updated, which may not be immediately obvious.
store the .diff file and the fixed version of the workflow (i.e. workflow-fixed.yml). The fixed workflow wouldn't actually be used for the tests, but would serve as useful context, showing what we expect the output to be. But this would also be vulnerable to drift over time: we might change the broken workflow and the diff and forget to update the fixed workflow.
- This solution could be enhanced by adding another step to make all that runs git diff (again, assuming it exists) on the broken and fixed versions of the workflows and updates the respective .diff files (this would need to be verified in the CI/CD tests)

Let me know which options you prefer for these two questions and I'll send a PR taking them into account.

* _{One perhaps-notable difference between this output and an actual git diff is that git diff adds the "scope" after the @@ anchors (i.e. @@ -1,2 +3,4 @@ func foo() {), while my code does not. However, this data isn't actually used by git apply or patch, so this is a meaningless difference.}

The text was updated successfully, but these errors were encountered:

pnacht added the kind/enhancement New feature or request label Mar 18, 2024

diogoteles08 added question Further information is requested needs discussion check/Dangerous-workflow labels Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add machine-readable remediation to the hasDangerousWorkflowScriptInjection probe #3950

Feature: Add machine-readable remediation to the hasDangerousWorkflowScriptInjection probe #3950

pnacht commented Mar 18, 2024

Feature: Add machine-readable remediation to the hasDangerousWorkflowScriptInjection probe #3950

Feature: Add machine-readable remediation to the hasDangerousWorkflowScriptInjection probe #3950

Comments

pnacht commented Mar 18, 2024

git diff-like output

How to store expected unit test outputs

`git diff`-like output