workers should support explicit, verified inputs with a manifest #89

escapewindow · 2017-08-15T18:49:30Z

We have a number of inputs that go into Gecko tasks:

docker image
tooltool artifacts
toolchain artifacts
previous builds
pypi / npm / etc modules

and we define those in various ways: requirements files, tooltool files, docker image task definition locations, env vars, etc. Having to audit or verify the inputs to a task is a very complex ask right now.

If we could define explicit inputs to a task,

worker downloads inputs
for any given shas, verify shas
for any given pubkeys, verify signatures
use the docker image downloaded once it passes verification
pass the other artifacts into the task environment
we can upload an inputs manifest with the above information

That's much easier to audit. It also could be the initial steps towards limiting outbound traffic once the task starts. This reminds me of @petemoore 's inputs/outputs to tasks proposal... where tasks can be chained like commandline pipes, although it's not one-dimensional (many-to-many piping).

djmitche · 2017-08-24T14:21:02Z

I'm not sure what this means. I don't think we could limit the inputs a task could consume..

Can you make a more detailed proposal?

escapewindow · 2017-08-24T14:44:31Z

If we have this, we could potentially then set up the firewall to disallow outbound connections during the task to force limiting the inputs a task could consume. That's not a requirement, but this RFC would be a first step towards being able to do that.

What details would you like?

Essentially, CoT verification of inputs is always going to be a patchy, hacky thing as long as the task can download inputs in any way: mh configs, env vars, mach commands, etc. By pre-defining this in the task definition in a standardized way, we can allow for a standardized verification.

One way would be to standardize on task.payload.upstreamArtifacts, which currently only supports artifacts from other tasks... e.g.

[{
  "taskId": "upstream-task-id",
  "taskType": "build",  # for cot verification purposes
  "paths": ["path/to/artifact1", "path/to/artifact2"],
  ...  # we can add more key/value pairs to the schema; currently we have `formats` which is for signing
}, {
  ...
}]

djmitche · 2017-08-24T14:50:13Z

What I don't understand is, we can write as much as we want in the task definition, but the task is still arbitrary code and can do what it wants. We can firewall a little, but most stuff we talk to is on Heroku or S3 or EC2 so that's a pretty blunt instrument. It certainly couldn't, for example, limit to tooltool artifacts with a particular hash or artifacts from a whitelist of taskIds.

escapewindow · 2017-08-24T14:52:56Z

Sure, someone can add something rogue. But for the official inputs, e.g. the build for a repackage task, or the complete mars for a partial regeneration task, we can put them in and make sure that their shas/sigs have not been modified before download. Otherwise we have to put in breadcrumbs for

repackage: "this is the sha I downloaded"
build: "this is the sha I uploaded"
signing: "let me compare the two shas"

i'd rather know that repackage would die if it downloaded a different sha from the build's upload.

escapewindow · 2017-08-24T15:39:20Z

i'd rather know that repackage would die if it downloaded a different sha from the build's upload.

This could be through something like the artifact service I've seen comments about. Scopes could potentially allow for modifying that, but signing could then compare the sha from the build task's CoT artifact and compare it to the sha from the artifact service, rather than having to download and verify every artifact. For signing to know that it needed to verify that artifact's sha, it would need to have a standardized way for repackage to say "build's artifact is an input I rely on", rather than adding in checks for all the various config methods of specifying inputs that we have today.

djmitche · 2017-09-29T21:29:08Z

So if I can boil this down, the idea is to have a standard way for a task to describe its "upstream" inputs. This would make CoT verification easier, and also make task implementation easier since the worker would download those inputs before beginning the task.

Generic-worker supports something like this in the mounts property - it can even unzip or untar things.

@escapewindow can you have a look at that functionality? If that suits, then maybe the proposal here is to replicate that functionality in taskcluster-worker and, when everything is using taskcluster-worker - #10 - start switching in-tree tasks to use that approach. I think @petemoore was interested in "plugging" tasks together this way (which probably explains why generic-worker has this support!)

escapewindow · 2017-09-29T23:06:29Z

I think mounts is good, in that it downloads something for the task. It seems to be missing the sha verification. The only ways I can currently think of doing that are:

a) an artifacts service that stores and returns artifact shas (we'd need to verify no one is modifying the artifacts + shas at rest), or
b) referring to the chain of trust artifact. If we verify the CoT sha + signature, that's one way to verify the artifact and sha haven't been modified at rest.

If (a), we may want to record our mounts' paths and shas in the chain of trust artifact, so the scriptworker verification step can compare the downloaded sha vs the uploaded sha. If (b), we're moving towards the taskcluster-supported workers verifying the chain of trust artifact, which we may want at some point anyway.

petemoore · 2017-10-02T13:57:26Z

For artifact SHA validation, we can update generic-worker (and other workers) to use the new Artifact API from #7 that jhford has been working on. This should take care of SHA validation of taskcluster artifacts.

For url SHA validation, we could make the sha-256 (or other algorithm) value an optional parameter in the payload, such that task will fail if checksum is not correct. Chain of trust could then make assertions that the checksum(s) are included in the payload, which would give us flexibility not to force task definitions to state checksum(s), but in the case we want to enforce it, we can via chain-of-trust.

djmitche mentioned this issue Jun 13, 2018

Allow tasks to depend on artifacts #121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workers should support explicit, verified inputs with a manifest #89

workers should support explicit, verified inputs with a manifest #89

escapewindow commented Aug 15, 2017

djmitche commented Aug 24, 2017

escapewindow commented Aug 24, 2017

djmitche commented Aug 24, 2017

escapewindow commented Aug 24, 2017

escapewindow commented Aug 24, 2017

djmitche commented Sep 29, 2017

escapewindow commented Sep 29, 2017

petemoore commented Oct 2, 2017

workers should support explicit, verified inputs with a manifest #89

workers should support explicit, verified inputs with a manifest #89

Comments

escapewindow commented Aug 15, 2017

djmitche commented Aug 24, 2017

escapewindow commented Aug 24, 2017

djmitche commented Aug 24, 2017

escapewindow commented Aug 24, 2017

escapewindow commented Aug 24, 2017

djmitche commented Sep 29, 2017

escapewindow commented Sep 29, 2017

petemoore commented Oct 2, 2017