-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor status handler, remove taskDefined handler #5665
Conversation
This looks good, but I'm wondering how best we can test the various types of race conditions that caused us pains before, aside from testing it in production. For example, could we deploy it to staging before releasing it, and then mirror production traffic to staging, to see how it handles things? I'm guessing it is going to be pretty tricky to mock production like conditions without building a big framework to emulate heavy load etc. |
I will be testing it on dev.alpha right now, with the single connected repo to confirm that refactoring still works. Goal of this refactoring were to avoid concurrent handlers trying to create/update check runs for a single task. Now, both events will be sent to the same queue, and as a result only one event will be handled at a time. |
c95f390
to
a63ad93
Compare
cc492d6
to
d031f11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookin good!
const taskDefinition = await this.queueClient.task(taskId); | ||
const fetchArtifact = async (artifactPath) => { | ||
if (taskDefined || !runId) { | ||
// when task is being defined, there will be no artifacts, so we fake the call and return empty response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
build, | ||
scopes: taskDefinition.scopes, | ||
}); | ||
const [liveLogText, customCheckRunText, customCheckRunAnnotationsText ] = await Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const [liveLogText, customCheckRunText, customCheckRunAnnotationsText ] = await Promise.all([ | |
const [ liveLogText, customCheckRunText, customCheckRunAnnotationsText ] = await Promise.all([ |
d41ea41
to
e47b721
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple small suggestions! But overall - 💪🏻 💪🏻
changelog/ZuGEp6tWRWW4ED2sVXTkuQ.md
Outdated
|
||
Refactored github status checks handler to do handle task status transitions in single place. | ||
|
||
Previous implementaition relied on two handlers: taskDefined and statusChanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous implementaition relied on two handlers: taskDefined and statusChanged. | |
Previous implementation relied on two handlers: taskDefined and statusChanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! :)
|
||
const callHandler = (name, handler) => message => { | ||
// handler returned must be async, as timedHandler will not be able to time it correctly | ||
const callHandler = (name, handler) => message => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🦅 👁️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this caused function to return early, and all monitor.timedHandler()
were returning over-optimistic durations :)
/** | ||
* Github has a limit of 64Kb for the whole payload | ||
*/ | ||
getRemainingMaxSize() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will ensure that if someone includes custom annotations, we will consider its size too
63dff6d
to
530a832
Compare
const { reasonResolved } = runs[runId] || {}; | ||
const taskDefined = state === undefined; | ||
|
||
await qLock.acquire(taskId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ensures we only run single status update for a given task. Consequent handlers will pause here until the previous is done.
530a832
to
fe0bdb8
Compare
} | ||
|
||
release(name) { | ||
const nextResolver = this.queue[name].shift(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only part that can cause issues, if client code doesn't call .release()
all pending callers will be stuck.
At the moment status handler ensures that it always calls it before the end.
For the future we might want to implement some fail-safe watchdog timers to force-release lock after some timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if a timer is the right fix here. If something is stuck and taking forever (like the GitHub API), probably waiting for it to fail is the right approach. The risk is that a return
gets added to the function without a corresponding release
. One way to avoid that is to have the acquire
take an async callback which is called with the lock held, and unconditionally release the lock when the callback returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was also thinking about this callback approach, but didn't want to make code more complicated on the handler side
services/github/src/queue-lock.js
Outdated
/** | ||
* Implements locked queue to allow one routine running at a time | ||
*/ | ||
class QueuedLock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if something from https://github.com/sindresorhus/promise-fun would serve this purpose, or at least could be used for the map values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find anything that matched my use-case
I'm not feeling too confident about being a good reviewer for this change. @djmitche Do you have any thoughts on this? |
@@ -5,7 +5,6 @@ defaults: | |||
deprecatedResultStatusQueue: 'stat-result' | |||
deprecatedInitialStatusQueue: 'stat-init' | |||
resultStatusQueue: 'ch-result' | |||
initialStatusQueue: 'ch-init' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll also need to drop this queue manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty solid -- my notes aren't anything serious.
How clear is it to someone deploying TC that they can't run more than one handler process? I suspect that previously this wasn't necessary but wasn't expressly forbidden. Now it must be forbidden.
let debug = makeDebug(this.monitor, { taskGroupId, taskId }); | ||
debug(`Handling state change for task ${taskId} in group ${taskGroupId}, reason=${reasonResolved || state}`); | ||
let debug = makeDebug(this.monitor, { taskGroupId, taskId, id: `id-${counter}` }); | ||
counter += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be an instance variable instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Realized that it is even better to move it to makeDebug()
so it would be applied to all handlers using this utility function, and not only for status handler 👍
} | ||
|
||
release(name) { | ||
const nextResolver = this.queue[name].shift(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if a timer is the right fix here. If something is stuck and taking forever (like the GitHub API), probably waiting for it to fail is the right approach. The risk is that a return
gets added to the function without a corresponding release
. One way to avoid that is to have the acquire
take an async callback which is called with the lock held, and unconditionally release the lock when the callback returns.
services/github/src/queue-lock.js
Outdated
/** | ||
* Implements locked queue to allow one routine running at a time | ||
*/ | ||
class QueuedLock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if something from https://github.com/sindresorhus/promise-fun would serve this purpose, or at least could be used for the map values?
fe0bdb8
to
5239527
Compare
Changes the way how status handlers are being called after changes introduced in #5665. Consumer handlers need to stay sync and return early to allow queue send new messages. This also improves timedHandler usage for handlers and adds periodic debug stats: number of running/error/total handlers by queue. Fixes #5728
Changes the way how status handlers are being called after changes introduced in #5665. Consumer handlers need to stay sync and return early to allow queue send new messages. This also improves timedHandler usage for handlers and adds periodic debug stats: number of running/error/total handlers by queue. Fixes #5728
Changes the way how status handlers are being called after changes introduced in #5665. Consumer handlers need to stay sync and return early to allow queue send new messages. This also improves timedHandler usage for handlers and adds periodic debug stats: number of running/error/total handlers by queue. Fixes #5728
This resolves an issue where two different handlers were trying to check/create/update check_run at the same time, which led to several check_runs being created and status reports were being sent to the wrong one.
Now single status handler manages all state transitions including task definition, running, and all of the completion states.
Besides that, if multiple events try to update same
taskId
, it will be ensured that they are executed strictly in order to not overwrite future events or create multiple check runs.This also removes
/rerequested
functionality for re-run tasks, as this feature is broken on github side, status remains forever completed no matter what. Instead, new check run is being created if task was rerun.