Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

authenticateHawk response has non-array scopes #3502

Closed
djmitche opened this issue Sep 14, 2020 · 22 comments · Fixed by #3798
Closed

authenticateHawk response has non-array scopes #3502

djmitche opened this issue Sep 14, 2020 · 22 comments · Fixed by #3798
Labels
bug Something isn't working

Comments

@djmitche
Copy link
Collaborator

https://sentry.prod.mozaws.net/operations/taskcluster-firefox-ci/issues/9581113/activity/?referrer=alert_email

Output schema validation error: 
Schema Validation Failed!
Rejecting Schema: https://firefox-ci-tc.services.mozilla.com/schemas/auth/v1/authenticate-hawk-response.json#
Errors:
  * data.scopes should be array
  ..(some other complaints about not matching the error arm of the `oneOf`)..

see more comment on the sentry issue.

@djmitche djmitche added the bug Something isn't working label Sep 14, 2020
@djmitche
Copy link
Collaborator Author

Searching by traceId (SO COOL!):

image

So, it looks like the authenticateHawk call was from queue.claimWork, was retried after just 200ms, and the retry was successful.

We do zillions of authenticateHawk calls, and this is the first time we've seen this issue.

@djmitche
Copy link
Collaborator Author

I think I see the issue -- we use scopes === null in a place where scopes is undefined.

@djmitche
Copy link
Collaborator Author

Hm, I don't think that's it. Looking more closely, this._clients has clients with undefined scopes, but this._clientCache is filled in a loop that sets scopes to null.

@djmitche
Copy link
Collaborator Author

BTW this occurred in v36.0.0, after 3fd060c and 9edca22 modified how the auth service interacts with the DB such that it uses the DB functions directly. But, scopes is calculated and not retrieved from the DB, so I'm not sure how that would be related. And a look at the diffs didn't reveal any smoking guns.

@djmitche
Copy link
Collaborator Author

Hm, I don't think that PR's putting debugging info at the right place. queue.claimWork is called with temporary credentials, so the returned scopes are from cert.scopes, not from the client object.

@djmitche
Copy link
Collaborator Author

The worker in this case was mobile-t-signing/mobile-t-signing-gk4fytmcssixdimll7vpq using static client project/releng/scriptworker/signing/prod/firefoxci-mobile-t. So my assertion in the previous comment that this call was made with temporary creds was incorrect.

@djmitche
Copy link
Collaborator Author

OK, I've spent way too long staring at this. Since these are permacreds, things are pretty simple. I've looked through the codepaths, and scopes are only manipulated in a few places. Several of those only occur with temporary credentials. For permacreds, it comes down to loadClient:

// Lazily expand client scopes
if (client.scopes === null) {
let scopes = this.resolve(client.unexpandedScopes);
client.scopes = scopes; // for createSignatureValidator compatibility
client.expandedScopes = scopes;
}

That calls the resolver:

return (inputs) => {
inputs = ScopeSetBuilder.normalizeScopeSet(inputs);
// Reduce input to the set of scopes starting with 'assume:'
const queue = inputs.filter(s => ASSUME_PREFIX.test(s));
// Check if we have an expansion of queue in LRU cache, if there is no
// such expansion we'll continue and compute one.
const cacheKey = queue.join('\n');
const cacheResult = lru.get(cacheKey);
if (cacheResult) {
return ScopeSetBuilder.mergeScopeSets(cacheResult, inputs);
}
// Build expansion of queue
const result = new ScopeSetBuilder();
for (const scope of queue) {
const input = scope.startsWith('assume:') ? scope.slice(7) : '*';
trie.execute(node, input, result);
}
// Store result in cache
const scopes = result.scopes();
lru.set(cacheKey, scopes);
return ScopeSetBuilder.mergeScopeSets(scopes, inputs);
};

and both return paths from the resolver return the result of mergeScopeSets. That function would probably choke on arguments that are null or undefined, since it calls .length and indexes them. It could probably survive being passed strings. In any case, it unconditionally returns an Array: it only sets local var scopes once, to [], and its only return path returns that variable. So the resolver certainly returns an Array.

So loadClient returns the client containing the scopes to loadCredentials:

const loadCredentials = async (clientId, ext) => {
// We may have two clientIds here: the credentialName (the one the caller
// sent in the Hawk Authorization header) and the issuingClientId (the one
// that signed the temporary credentials).
let credentialName = clientId,
issuingClientId = clientId;
// extract ext.certificate.issuer, if present
if (ext) {
ext = parseExt(ext);
if (ext.certificate && ext.certificate.issuer) {
issuingClientId = ext.certificate.issuer;
if (typeof issuingClientId !== 'string') {
throw new Error('ext.certificate.issuer must be a string');
}
if (issuingClientId === credentialName) {
throw new Error('ext.certificate.issuer must differ from the supplied clientId');
}
}
}
let accessToken, scopes, expires;
({ clientId, expires, accessToken, scopes } = await options.clientLoader(issuingClientId));
// apply restrictions based on the ext field
if (ext) {
({ scopes, expires, accessToken } = limitClientWithExt(
credentialName, issuingClientId, accessToken,
scopes, expires, ext, options.expandScopes));
}
return {
key: accessToken,
algorithm: 'sha256',
clientId: credentialName,
expires: expires,
scopes: scopes,
};
};

ext is not set here, so that destructures the client and returns the scopes unmodified to the signature validator:
return async function(req) {
let credentials, attributes, result, authResult;
try {
if (req.authorization) {
authResult = await hawk.server.authenticate({
method: req.method.toUpperCase(),
url: req.resource,
host: req.host,
port: req.port,
authorization: req.authorization,
}, async (clientId) => {
let ext = undefined;
// Parse authorization header for ext
let attrs = hawk.utils.parseAuthorizationHeader(
req.authorization,
);
// Extra ext
if (!(attrs instanceof Error)) {
ext = attrs.ext;
}
// Get credentials with ext
return loadCredentials(clientId, ext);
}, {
// Not sure if JSON stringify is not deterministic by specification.
// I suspect not, so we'll postpone this till we're sure we want to do
// payload validation and how we want to do it.
//payload: JSON.stringify(req.body),
// We found that clients often have time skew (particularly on OSX)
// since all our services require https we hardcode the allowed skew
// to a very high number (15 min) similar to AWS.
timestampSkewSec: 15 * 60,
});

Here hawk.server.authenticate is a part of the Hawk library and doesn't know a thing about scopes. It returns the result of its second argument as authResult.credentials
credentials = authResult.credentials;

That's then encoded into the API response shape and returned:
result = {
status: 'auth-success',
scheme: 'hawk',
expires: credentials.expires,
scopes: credentials.scopes,
clientId: credentials.clientId,
};

and the API method changes the expires format and returns it:
result = {
status: 'auth-success',
scheme: 'hawk',
expires: credentials.expires,
scopes: credentials.scopes,
clientId: credentials.clientId,
};

In all of that, my suspicion is that if (client.scopes === null) is the culprit and that .scopes wasn't null but also wasn't an Array. So, #3504 is probably a good PR after all.

@djmitche djmitche self-assigned this Sep 14, 2020
@djmitche djmitche added the blocked Blocked on something else label Sep 14, 2020
@djmitche
Copy link
Collaborator Author

Let's give this a few weeks between deployment and closing as can't-reproduce.

djmitche added a commit that referenced this issue Sep 15, 2020
#3502 - get more data when scopes is not an array
@djmitche
Copy link
Collaborator Author

Deployed to community today.

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 21, 2020

This also occurred September 18 2020 22:10:08 UTC in firefox-ci (a total of two times now)

@petemoore
Copy link
Member

This also occurred September 18 2020 22:10:08 UTC in firefox-ci (a total of two times now)

Ah too bad that 658487e hasn't made it to production in firefox cluster yet. I see we're running 37.1.0 but the extra debugging went into 37.2.0.... Hopefully it will happen again once 37.2.0 or higher is deployed...

@djmitche
Copy link
Collaborator Author

djmitche commented Oct 7, 2020

Still last reported in 37.1.0 (and I don't see it at all for community). I'll give it another week or so.

@djmitche
Copy link
Collaborator Author

Still not seen since Sept 18. We can re-open if this gets spotted.

@djmitche djmitche reopened this Oct 27, 2020
@djmitche
Copy link
Collaborator Author

Occurred again at October 27 2020 10:24:08 UTC

@djmitche
Copy link
Collaborator Author

djmitche commented Oct 27, 2020

The assertion in #3504 did not fail!

@djmitche
Copy link
Collaborator Author

djmitche commented Oct 27, 2020

Here's the logging this time

{
  "timestamp": "2020-10-27T10:24:08.596573156Z",
  "Type": "monitor.error",
  "Fields": {
    "message": "Output schema validation error: \nSchema Validation Failed!\nRejecting Schema: https://firefox-ci-tc.services.mozilla.com/schemas/auth/v1/authenticate-hawk-response.json#\nErrors:\n  * data.scopes should be array\n  * data should NOT have additional properties: \"scheme\"\n  * data should NOT have additional properties: \"expires\"\n  * data should NOT have additional properties: \"scopes\"\n  * data should NOT have additional properties: \"clientId\"\n  * data should NOT have additional properties: \"hash\"\n  * data.status should be equal to one of the allowed values\n  * data should have required property 'message'\n  * data should match exactly one schema in oneOf",
    "incidentId": "5d75a682-63f6-4f9e-82f8-28a9b0c1876a",
    "params": {},
    "name": "Error",
    "schema": "https://firefox-ci-tc.services.mozilla.com/schemas/auth/https://firefox-ci-tc.services.mozilla.com/schemas/auth/v1/authenticate-hawk-response.json#",
    "payload": {
      "port": 443,
      "host": "firefox-ci-tc.services.mozilla.com",
      "sourceIp": "34.83.21.167",
      "authorization": "Hawk mac=\"***\", hash=\"9O/k6UTxx2c7Wq2ZG28XE/uXFX3WbGMARo7Tb0jgbCc=\", id=\"project/releng/scriptworker/beetmover/prod/firefoxci-gecko-3\", ts=\"1603794248\", nonce=\"GCW6fH\", ext=\"e30=\"",
      "resource": "/api/queue/v1/claim-work/scriptworker-k8s/gecko-3-beetmover",
      "method": "post"
    },
    "url": "/authenticate-hawk",
    "reportId": "2eaf16b208c9432981110a5fb0561c43",
    "method": "authenticateHawk",
    "stack": "Error: Output schema validation error: \nSchema Validation Failed!\nRejecting Schema: https://firefox-ci-tc.services.mozilla.com/schemas/auth/v1/authenticate-hawk-response.json#\nErrors:\n  * data.scopes should be array\n  * data should NOT have additional properties: \"scheme\"\n  * data should NOT have additional properties: \"expires\"\n  * data should NOT have additional properties: \"scopes\"\n  * data should NOT have additional properties: \"clientId\"\n  * data should NOT have additional properties: \"hash\"\n  * data.status should be equal to one of the allowed values\n  * data should have required property 'message'\n  * data should match exactly one schema in oneOf\n    at ServerResponse.res.reply (/app/libraries/api/src/middleware/schema.js:80:23)\n    at /app/services/auth/src/api.js:970:16\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:97:5)",
    "v": 1
  }
}
{
  "timestamp": "2020-10-27T10:24:08.597856875Z",
  "Type": "monitor.apiMethod",
  "Fields": {
    "public": true,
    "sourceIp": "::ffff:10.4.4.132",
    "statusCode": 500,
    "method": "POST",
    "query": {},
    "name": "authenticateHawk",
    "expires": "",
    "clientId": "",
    "resource": "/authenticate-hawk",
    "hasAuthed": false,
    "satisfyingScopes": [],
    "v": 1,
    "duration": 285.785589,
    "apiVersion": "v1"
  }
}
{
  "timestamp": "2020-10-27T10:24:08.853340003Z",
  "Type": "monitor.apiMethod",
  "Fields": {
    "query": {},
    "resource": "/authenticate-hawk",
    "duration": 6.161082,
    "v": 1,
    "expires": "",
    "hasAuthed": false,
    "method": "POST",
    "sourceIp": "::ffff:10.4.4.132",
    "satisfyingScopes": [],
    "statusCode": 200,
    "clientId": "",
    "name": "authenticateHawk",
    "public": true,
    "apiVersion": "v1"
  }
}
{
  "timestamp": "2020-10-27T10:24:28.857431344Z",
  "Type": "monitor.apiMethod",
  "Fields": {
    "satisfyingScopes": [
      "queue:claim-work:scriptworker-k8s/gecko-3-beetmover",
      "queue:worker-id:gecko-3-beetmover/gecko-3-beetmover-av4vicjstlu4shcl2hvm"
    ],
    "public": false,
    "hasAuthed": true,
    "name": "claimWork",
    "query": {},
    "resource": "/claim-work/scriptworker-k8s/gecko-3-beetmover",
    "clientId": "project/releng/scriptworker/beetmover/prod/firefoxci-gecko-3",
    "method": "POST",
    "v": 1,
    "apiVersion": "v1",
    "expires": "3019-11-07T00:02:07.485Z",
    "duration": 20753.919976,
    "statusCode": 200,
    "sourceIp": "34.83.21.167"
  }
}

@djmitche
Copy link
Collaborator Author

Interesting that this is from scriptworker both times!

@djmitche
Copy link
Collaborator Author

Full error stack:

Error: Output schema validation error:                                                                                               Schema Validation Failed!
Rejecting Schema: https://firefox-ci-tc.services.mozilla.com/schemas/auth/v1/authenticate-hawk-response.json#
Errors:
  * data.scopes should be array
  * data should NOT have additional properties: "scheme"
  * data should NOT have additional properties: "expires"
  * data should NOT have additional properties: "scopes"
  * data should NOT have additional properties: "clientId"
  * data should NOT have additional properties: "hash"
  * data.status should be equal to one of the allowed values
  * data should have required property 'message'
  * data should match exactly one schema in oneOf
    at ServerResponse.res.reply (/app/libraries/api/src/middleware/schema.js:80:23)
    at /app/services/auth/src/api.js:970:16
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

@djmitche
Copy link
Collaborator Author

Doing some experimenting with schema validation, I confirm that this is message corresponds to an auth-success response with a present but non-array scopes (string, null, undefined). But note that JSON serialization maps undefined to null, so {..., scopes: undefined} produces this error mesage.

@djmitche
Copy link
Collaborator Author

To be clear, in the previous comment I was able to reproduce the error message by making up fake response bodies. I was not able to reproduce the problem.

I'm drawing a blank here. Given that the assert from #3504 did not fire, we can assume that resolver.loadClient returned an object with scopes being an array. Tracing that back through @hapi/hawk and the signature validator, I don't see anywhere that .scopes is modified in this case (limitClientWithExt can modify scopes, but only if ext has specific properties, and from the logs ext is e30= ({}) here).

I'd love to have someone else take a look at this -- I must be missing something. Some issue with object aliasing? Something specific in the request that's causing this?

@djmitche djmitche removed their assignment Oct 27, 2020
@djmitche djmitche removed the blocked Blocked on something else label Oct 27, 2020
@djmitche
Copy link
Collaborator Author

@imbstack and I found the issue. The client returned by the scope resolver is a reference to the client object used internally in the ScopeResolver object. The resolver lazily computes the client's expanded scopes on-demand, putting the result in client.scopes.

When a client's last-used date needs to be updated, we await a database operation, so there's time for concurrent things to happen.

When there's a change to clients or roles, the _rebuildResolver method runs through all of the clients and sets client.scopes to null.

So we think what's happening is:

  • resolver lazily computes scopes, sets client.scopes to an array of strings
  • resolver calls DB to update last-used date
  • _rebuildResolver sets client.scopes to null
  • DB call returns, signatureValidator validates the signature, returns client.scopes -- but that's null
  • error gets thrown

@imbstack will be putting up a PR shortly, to shallow-clone the returned value and avoid this issue.

@petemoore
Copy link
Member

Nice sleuthing! I'll take a closer look tomorrow, out for the day now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants