Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workers should support explicit, verified inputs with a manifest #89

Open
escapewindow opened this issue Aug 15, 2017 · 8 comments
Open

Comments

@escapewindow
Copy link
Contributor

We have a number of inputs that go into Gecko tasks:

  • docker image
  • tooltool artifacts
  • toolchain artifacts
  • previous builds
  • pypi / npm / etc modules

and we define those in various ways: requirements files, tooltool files, docker image task definition locations, env vars, etc. Having to audit or verify the inputs to a task is a very complex ask right now.

If we could define explicit inputs to a task,

  • worker downloads inputs
  • for any given shas, verify shas
  • for any given pubkeys, verify signatures
  • use the docker image downloaded once it passes verification
  • pass the other artifacts into the task environment
  • we can upload an inputs manifest with the above information

That's much easier to audit. It also could be the initial steps towards limiting outbound traffic once the task starts. This reminds me of @petemoore 's inputs/outputs to tasks proposal... where tasks can be chained like commandline pipes, although it's not one-dimensional (many-to-many piping).

@djmitche
Copy link
Contributor

I'm not sure what this means. I don't think we could limit the inputs a task could consume..

Can you make a more detailed proposal?

@escapewindow
Copy link
Contributor Author

If we have this, we could potentially then set up the firewall to disallow outbound connections during the task to force limiting the inputs a task could consume. That's not a requirement, but this RFC would be a first step towards being able to do that.

What details would you like?

Essentially, CoT verification of inputs is always going to be a patchy, hacky thing as long as the task can download inputs in any way: mh configs, env vars, mach commands, etc. By pre-defining this in the task definition in a standardized way, we can allow for a standardized verification.

One way would be to standardize on task.payload.upstreamArtifacts, which currently only supports artifacts from other tasks... e.g.

[{
  "taskId": "upstream-task-id",
  "taskType": "build",  # for cot verification purposes
  "paths": ["path/to/artifact1", "path/to/artifact2"],
  ...  # we can add more key/value pairs to the schema; currently we have `formats` which is for signing
}, {
  ...
}]

@djmitche
Copy link
Contributor

What I don't understand is, we can write as much as we want in the task definition, but the task is still arbitrary code and can do what it wants. We can firewall a little, but most stuff we talk to is on Heroku or S3 or EC2 so that's a pretty blunt instrument. It certainly couldn't, for example, limit to tooltool artifacts with a particular hash or artifacts from a whitelist of taskIds.

@escapewindow
Copy link
Contributor Author

Sure, someone can add something rogue. But for the official inputs, e.g. the build for a repackage task, or the complete mars for a partial regeneration task, we can put them in and make sure that their shas/sigs have not been modified before download. Otherwise we have to put in breadcrumbs for

  • repackage: "this is the sha I downloaded"
  • build: "this is the sha I uploaded"
  • signing: "let me compare the two shas"

i'd rather know that repackage would die if it downloaded a different sha from the build's upload.

@escapewindow
Copy link
Contributor Author

i'd rather know that repackage would die if it downloaded a different sha from the build's upload.

This could be through something like the artifact service I've seen comments about. Scopes could potentially allow for modifying that, but signing could then compare the sha from the build task's CoT artifact and compare it to the sha from the artifact service, rather than having to download and verify every artifact. For signing to know that it needed to verify that artifact's sha, it would need to have a standardized way for repackage to say "build's artifact is an input I rely on", rather than adding in checks for all the various config methods of specifying inputs that we have today.

@djmitche
Copy link
Contributor

So if I can boil this down, the idea is to have a standard way for a task to describe its "upstream" inputs. This would make CoT verification easier, and also make task implementation easier since the worker would download those inputs before beginning the task.

Generic-worker supports something like this in the mounts property - it can even unzip or untar things.

@escapewindow can you have a look at that functionality? If that suits, then maybe the proposal here is to replicate that functionality in taskcluster-worker and, when everything is using taskcluster-worker - #10 - start switching in-tree tasks to use that approach. I think @petemoore was interested in "plugging" tasks together this way (which probably explains why generic-worker has this support!)

@escapewindow
Copy link
Contributor Author

I think mounts is good, in that it downloads something for the task. It seems to be missing the sha verification. The only ways I can currently think of doing that are:

a) an artifacts service that stores and returns artifact shas (we'd need to verify no one is modifying the artifacts + shas at rest), or
b) referring to the chain of trust artifact. If we verify the CoT sha + signature, that's one way to verify the artifact and sha haven't been modified at rest.

If (a), we may want to record our mounts' paths and shas in the chain of trust artifact, so the scriptworker verification step can compare the downloaded sha vs the uploaded sha. If (b), we're moving towards the taskcluster-supported workers verifying the chain of trust artifact, which we may want at some point anyway.

@petemoore
Copy link
Member

For artifact SHA validation, we can update generic-worker (and other workers) to use the new Artifact API from #7 that jhford has been working on. This should take care of SHA validation of taskcluster artifacts.

For url SHA validation, we could make the sha-256 (or other algorithm) value an optional parameter in the payload, such that task will fail if checksum is not correct. Chain of trust could then make assertions that the checksum(s) are included in the payload, which would give us flexibility not to force task definitions to state checksum(s), but in the case we want to enforce it, we can via chain-of-trust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants