New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(ngcc): allow immediately reporting a stale lock file #37250
perf(ngcc): allow immediately reporting a stale lock file #37250
Conversation
packages/compiler-cli/ngcc/src/locking/lock_file_with_child_process/index.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @terencehonles - thanks for sending in this PR. It is a great idea.
Given the SyncLocker
already bails out immediately when the lockfile is not free then this is actually only a concern of the AsyncLocker
. So I think we should move the logic there, where we are already doing a catch of the error and checking for EEXIST
.
Also I would be keen to make the check for whether the referenced process is still running to a separate method of AsyncLocker
:
private isRunning(pid: string): boolean {
try {
return process.kill(+pid, 0);
} catch (e) {
// if there is a problem checking if the process is running throw the original error.
if (e.code !== 'ESRCH') {
throw e;
}
}
return false;
}
@petebacondarwin sounds good! I was contemplating breaking it out to a function, but was avoiding changing the surface of the The other reason I put it in the |
I agree that the underlying implementation of the lockfile are leaking into the lockers already (e.g. the The main reason for abstracting it out was to allow the two different lockers, rather than having multiple different lock files. If we decided to go that way then I think we would need to move more into the lockfile and out of the lockers anyway. |
4794839
to
76ee8e0
Compare
@petebacondarwin yup, anyways I refactored the PR. I noticed a lint error from before and figured out how to run the formatter, and I also noticed the build failed because |
Probably stating the obvious (but just in case): We need to ensure that this works on Linux, macOS and Windows. |
76ee8e0
to
4cab9e9
Compare
Yup, I'm on Linux and macOS should be the same. From what I can tell it does seem like node emulates |
ab0808f
to
4072f36
Compare
Ok, all the tests should be passing at this point. |
4072f36
to
e7af3af
Compare
GitHub seems to be having issues (I believe some of the tests failed to run, not the tests failed). Trying to shake things free. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of NITs - but this looks good. We need to check that it works on Windows...
} | ||
return lockFileContents; | ||
}); | ||
spyOn(process, 'kill').and.callFake(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jasmine now has a .andThrowError()
method. (https://jasmine.github.io/2.0/introduction.html#section-Spies:_%3Ccode%3Eand.throwError%3C/code%3E)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I used initially, but it wasted a bunch of my time 😒. The .andThrowError
function takes a string or Error object, and since {code: 'ESRCH'}
is neither it will convert it to a string ('[object Object]'
😒 ) which then will easily confuse you on why the test is failing, but it really should be working.
...and yes, I ended up having to looking for the source code for Jasmine, which annoyingly their documentation doesn't actually seem to point out the signatures for their test functions.
If you would like me to change the code I can, but it would probably need to be something like .andThrowError(Object.assign(new Error(), {code: 'ESRCH'}))
Sadly JavaScript still has some way to go till it is as nice as writing in a more mature language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh that is a pain... Happy to leave this as-is unless you wanted to make a helper function:
function createNodeError(code: string, message?: string) {
const e = new Error(message);
e.code = code;
return e;
}
And then do something like:
.andThrowError(createNodeError('ESRCH'));
(I haven't checked whether that works!)
Happy either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly doubt the helper function would be reused 😕 so I think it may be better to just leave it as is.
I may look at submitting a patch to Jasmine so that if it's not strictly a string it doesn't try to construct an Error around the value passed in. That way this code could be changed in the future, or other people who are trying to do the same thing aren't bitten by the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created PR jasmine/jasmine#1822 so hopefully they like the suggestion.
I tried this out on my Windows box and it appears to work! 🎉 |
e6b104f
to
c8e12eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being pedantic: this is marked as "feat" but targeting patch (which is 10.RC at the moment), so perhaps "perf" would be more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of optional nits. LGTM otherwise ✨
} | ||
|
||
throw new TimeoutError(this.lockFileMessage( | ||
`Lock found, but no process with PID ${pid} seems to be running.`)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, it is still possible to have a race condition resulting in an inaccurate message:
- Say
pid: 1
is running solockFile.write()
fails at the top of thefor
loop. - We read the PID from the lock file and set the
pid
variable to1
. - Since
attempts
is 0, we callisProcessRunning(1)
. - Now, let's say that the process has ended between the last time we read the lock-file to get the PID and us calling
isProcessRunning(1)
. - In this case, it is (theoretically) possible that another process has started and created a lock-file. Thus, the
lockFile.write()
call right afterisProcessRunning(1)
will fail (but not because the original PID (1) is still in the lock-file, as the error message will imply).
I admit that the likelihood of this race condition happening is too low (and could potentially be ignored), but we could avoid it by doing an extra check that pid === this.lockFile.read()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't even do pid === this.lockFile.read()
, since this could also fall foul of a race condition. (Even more unlikely).
Perhaps the best option is to change the error message to accept that it might be that this is the case and still bail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I contemplated doing another read for a PID change, but I figured it was rather unlikely, and the point of this PR was more to bail out and give a helpful message as to what might be happening. I can do the last ditch read if that might be useful, but just to be clear the complete message is the following:
Lock found, but no process with PID ${pid} seems to be running.
(If you are sure no ngcc process is running then you should delete the lock-file at ${this.lockFile.path}.)
and I felt that it was probably clear enough that there is a chance this was not correct (and why I didn't have ngcc
automatically try to clean up the lock file).
I'm fine either way, but I'd probably need a suggestion on what to change the message to if that's the direction we want to take things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't even do
pid === this.lockFile.read()
, since this could also fall foul of a race condition. (Even more unlikely).
I would still prefer the less likely race condition 😁
But I don't feel too strongly about it, given the likelyhood is quite low anyway. Let's leave it as is (I doubt anyone will run into it often enough to be bothered 😛).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I was rebasing the change, I added a final .read()
to check the PID remained the same.
This makes sense. @terencehonles can you change this? |
c8e12eb
to
b93b979
Compare
Done, I'll let you or @gkalpak chime in on the remaining comments. |
If @gkalpak doesn't have any more suggestions by tomorrow night, I'll mark this for merge. |
@terencehonles - the CI failure appears to be related to a built files size regression in one of integration tests. I don't think this PR should affect that. Can you try rebasing on master? |
29a66ea
to
3d35d88
Compare
@petebacondarwin done, and almost forgot to push my change for #37250 (comment) 😅 |
Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up.
3d35d88
to
f8b6c2a
Compare
Looks like I probably needed 1b55da1 to fix the size limits |
Good work @terencehonles - CI is green! We are ready to merge. |
Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up. PR Close #37250
) Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up. PR Close angular#37250
This issue has been automatically locked due to inactivity. Read more about our automatic conversation locking policy. This action has been performed automatically by a bot. |
) Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up. PR Close angular#37250
Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up.
PR Checklist
Please check if your PR fulfills the following requirements:
PR Type
What kind of change does this PR introduce?
What is the current behavior?
ngcc
will wait on a lock file that will never be resolved until the maximum timeout is reached, and only then will it tell the user the path to delete the lock file.Issue Number: N/A
What is the new behavior?
ngcc
will exit immediately, and report the path to the lock file, and let the user decide if it should be deleted.Does this PR introduce a breaking change?