perf(ngcc): allow immediately reporting a stale lock file #37250

terencehonles · 2020-05-22T04:35:30Z

Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up.

PR Checklist

Please check if your PR fulfills the following requirements:

The commit message follows our guidelines: https://github.com/angular/angular/blob/master/CONTRIBUTING.md#commit
Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

Bugfix
Feature
Code style update (formatting, local variables)
Refactoring (no functional changes, no api changes)
Build related changes
CI related changes
Documentation content changes
angular.io application / infrastructure changes
Other... Please describe:

What is the current behavior?

ngcc will wait on a lock file that will never be resolved until the maximum timeout is reached, and only then will it tell the user the path to delete the lock file.

Issue Number: N/A

What is the new behavior?

ngcc will exit immediately, and report the path to the lock file, and let the user decide if it should be deleted.

Does this PR introduce a breaking change?

Yes
No

packages/compiler-cli/ngcc/src/locking/lock_file_with_child_process/index.ts

petebacondarwin

Hi @terencehonles - thanks for sending in this PR. It is a great idea.
Given the SyncLocker already bails out immediately when the lockfile is not free then this is actually only a concern of the AsyncLocker. So I think we should move the logic there, where we are already doing a catch of the error and checking for EEXIST.

Also I would be keen to make the check for whether the referenced process is still running to a separate method of AsyncLocker:

private isRunning(pid: string): boolean {
  try {
    return process.kill(+pid, 0);
  } catch (e) {
    // if there is a problem checking if the process is running throw the original error.
    if (e.code !== 'ESRCH') {
      throw e;
    }
  }
  return false;
}

terencehonles · 2020-05-22T16:49:25Z

@petebacondarwin sounds good! I was contemplating breaking it out to a function, but was avoiding changing the surface of the LockFile interface. I had noticed the sync locker didn't retry, but I figured that might be coming from somewhere else. I guess not, and I realize now, that I ran into this because I was building an angular project "experimentally" instead of running ngcc like our normal build tooling does.

The other reason I put it in the LockFile implementation is that implementation is the one reading/writing the lock file and describing how the lock is actually implemented including its contents. I was thinking there could be a case where the lock is being requested over the network or in some space that the PID wouldn't be found anymore, but still be valid. I'm probably overthinking things, and the lockers both do know enough about the internal implementation that I can see that the better process boundary would be at the locker's .lock(...) method. I still think the actual contents of the lock file is more "owned" by the LockFile itself, but I don't mind moving the code.

petebacondarwin · 2020-05-22T16:52:24Z

I agree that the underlying implementation of the lockfile are leaking into the lockers already (e.g. the EEXIST error.

The main reason for abstracting it out was to allow the two different lockers, rather than having multiple different lock files. If we decided to go that way then I think we would need to move more into the lockfile and out of the lockers anyway.

terencehonles · 2020-05-22T17:54:31Z

@petebacondarwin yup, anyways I refactored the PR. I noticed a lint error from before and figured out how to run the formatter, and I also noticed the build failed because process.kill is not typed to return boolean. I believe the cast I did and the changes should be satisfactory, but I'll poke at the tests when they run and hopefully they just pass 🙂

gkalpak · 2020-05-22T19:02:17Z

Probably stating the obvious (but just in case): We need to ensure that this works on Linux, macOS and Windows.

terencehonles · 2020-05-22T19:07:38Z

Probably stating the obvious (but just in case): We need to ensure that this works on Linux, macOS and Windows.

Yup, I'm on Linux and macOS should be the same. From what I can tell it does seem like node emulates kill(pid, 0) for Windows, but I don't have a machine to easily test against.

terencehonles · 2020-05-22T22:53:38Z

Ok, all the tests should be passing at this point.

terencehonles · 2020-05-22T23:37:36Z

Ok, all the tests should be passing at this point.

GitHub seems to be having issues (I believe some of the tests failed to run, not the tests failed). Trying to shake things free.

petebacondarwin

A couple of NITs - but this looks good. We need to check that it works on Windows...

packages/compiler-cli/ngcc/src/locking/async_locker.ts

packages/compiler-cli/ngcc/test/locking/async_locker_spec.ts

petebacondarwin · 2020-05-30T20:03:06Z

packages/compiler-cli/ngcc/test/locking/async_locker_spec.ts

+          }
+          return lockFileContents;
+        });
+        spyOn(process, 'kill').and.callFake(() => {


Jasmine now has a .andThrowError() method. (https://jasmine.github.io/2.0/introduction.html#section-Spies:_%3Ccode%3Eand.throwError%3C/code%3E)

Yeah, that's what I used initially, but it wasted a bunch of my time 😒. The .andThrowError function takes a string or Error object, and since {code: 'ESRCH'} is neither it will convert it to a string ('[object Object]' 😒 ) which then will easily confuse you on why the test is failing, but it really should be working.

...and yes, I ended up having to looking for the source code for Jasmine, which annoyingly their documentation doesn't actually seem to point out the signatures for their test functions.

If you would like me to change the code I can, but it would probably need to be something like .andThrowError(Object.assign(new Error(), {code: 'ESRCH'}))

Sadly JavaScript still has some way to go till it is as nice as writing in a more mature language.

Ugh that is a pain... Happy to leave this as-is unless you wanted to make a helper function:

function createNodeError(code: string, message?: string) { const e = new Error(message); e.code = code; return e; }

And then do something like:

.andThrowError(createNodeError('ESRCH'));

(I haven't checked whether that works!)

Happy either way.

I slightly doubt the helper function would be reused 😕 so I think it may be better to just leave it as is.

I may look at submitting a patch to Jasmine so that if it's not strictly a string it doesn't try to construct an Error around the value passed in. That way this code could be changed in the future, or other people who are trying to do the same thing aren't bitten by the same problem.

I created PR jasmine/jasmine#1822 so hopefully they like the suggestion.

petebacondarwin · 2020-05-31T10:48:12Z

I tried this out on my Windows box and it appears to work! 🎉

JoostK

Being pedantic: this is marked as "feat" but targeting patch (which is 10.RC at the moment), so perhaps "perf" would be more appropriate.

gkalpak

A couple of optional nits. LGTM otherwise ✨

packages/compiler-cli/ngcc/src/locking/async_locker.ts

gkalpak · 2020-06-01T13:29:03Z

packages/compiler-cli/ngcc/src/locking/async_locker.ts

+            }
+
+            throw new TimeoutError(this.lockFileMessage(
+                `Lock found, but no process with PID ${pid} seems to be running.`));


Theoretically, it is still possible to have a race condition resulting in an inaccurate message:

Say pid: 1 is running so lockFile.write() fails at the top of the for loop.

We read the PID from the lock file and set the pid variable to 1.

Since attempts is 0, we call isProcessRunning(1).

Now, let's say that the process has ended between the last time we read the lock-file to get the PID and us calling isProcessRunning(1).

In this case, it is (theoretically) possible that another process has started and created a lock-file. Thus, the lockFile.write() call right after isProcessRunning(1) will fail (but not because the original PID (1) is still in the lock-file, as the error message will imply).

I admit that the likelihood of this race condition happening is too low (and could potentially be ignored), but we could avoid it by doing an extra check that pid === this.lockFile.read().

We can't even do pid === this.lockFile.read(), since this could also fall foul of a race condition. (Even more unlikely).
Perhaps the best option is to change the error message to accept that it might be that this is the case and still bail.

I contemplated doing another read for a PID change, but I figured it was rather unlikely, and the point of this PR was more to bail out and give a helpful message as to what might be happening. I can do the last ditch read if that might be useful, but just to be clear the complete message is the following:

Lock found, but no process with PID ${pid} seems to be running.
(If you are sure no ngcc process is running then you should delete the lock-file at ${this.lockFile.path}.)

and I felt that it was probably clear enough that there is a chance this was not correct (and why I didn't have ngcc automatically try to clean up the lock file).

I'm fine either way, but I'd probably need a suggestion on what to change the message to if that's the direction we want to take things.

We can't even do pid === this.lockFile.read(), since this could also fall foul of a race condition. (Even more unlikely).

I would still prefer the less likely race condition 😁
But I don't feel too strongly about it, given the likelyhood is quite low anyway. Let's leave it as is (I doubt anyone will run into it often enough to be bothered 😛).

Since I was rebasing the change, I added a final .read() to check the PID remained the same.

petebacondarwin · 2020-06-01T15:14:34Z

Being pedantic: this is marked as "feat" but targeting patch (which is 10.RC at the moment), so perhaps "perf" would be more appropriate.

This makes sense. @terencehonles can you change this?

terencehonles · 2020-06-01T21:36:04Z

Being pedantic: this is marked as "feat" but targeting patch (which is 10.RC at the moment), so perhaps "perf" would be more appropriate.

This makes sense. @terencehonles can you change this?

Done, I'll let you or @gkalpak chime in on the remaining comments.

petebacondarwin · 2020-06-01T21:56:23Z

If @gkalpak doesn't have any more suggestions by tomorrow night, I'll mark this for merge.

petebacondarwin · 2020-06-02T13:12:46Z

@terencehonles - the CI failure appears to be related to a built files size regression in one of integration tests. I don't think this PR should affect that. Can you try rebasing on master?

terencehonles · 2020-06-02T17:14:02Z

@petebacondarwin done, and almost forgot to push my change for #37250 (comment) 😅

Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up.

terencehonles · 2020-06-02T17:54:40Z

Looks like I probably needed 1b55da1 to fix the size limits

petebacondarwin · 2020-06-02T18:19:14Z

Good work @terencehonles - CI is green! We are ready to merge.

Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up. PR Close #37250

) Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up. PR Close angular#37250

angular-automatic-lock-bot · 2020-07-04T16:13:06Z

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_{This action has been performed automatically by a bot.}

) Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up. PR Close angular#37250

googlebot added the cla: yes label May 22, 2020

pullapprove bot requested a review from alxhub May 22, 2020 04:35

terencehonles commented May 22, 2020

View reviewed changes

packages/compiler-cli/ngcc/src/locking/lock_file_with_child_process/index.ts Outdated Show resolved Hide resolved

petebacondarwin requested changes May 22, 2020

View reviewed changes

ngbot bot added this to the Backlog milestone May 22, 2020

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 4794839 to 76ee8e0 Compare May 22, 2020 17:51

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 76ee8e0 to 4cab9e9 Compare May 22, 2020 19:03

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch 3 times, most recently from ab0808f to 4072f36 Compare May 22, 2020 22:53

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 4072f36 to e7af3af Compare May 22, 2020 23:36

petebacondarwin approved these changes May 30, 2020

View reviewed changes

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch 2 times, most recently from e6b104f to c8e12eb Compare June 1, 2020 01:52

petebacondarwin removed the request for review from alxhub June 1, 2020 08:52

JoostK approved these changes Jun 1, 2020

View reviewed changes

gkalpak approved these changes Jun 1, 2020

View reviewed changes

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from c8e12eb to b93b979 Compare June 1, 2020 21:29

terencehonles changed the title ~~feat(ngcc): allow immediately reporting a stale lock file~~ perf(ngcc): allow immediately reporting a stale lock file Jun 1, 2020

petebacondarwin added action: merge The PR is ready for merge by the caretaker and removed action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews labels Jun 2, 2020

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch 2 times, most recently from 29a66ea to 3d35d88 Compare June 2, 2020 17:13

terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 3d35d88 to f8b6c2a Compare June 2, 2020 17:54

matsko closed this in 561c0f8 Jun 2, 2020

terencehonles deleted the ngcc-immediately-report-stale-lock-file branch June 3, 2020 18:13

terencehonles mentioned this pull request Jun 26, 2020

allow spy throwError to throw an Object jasmine/jasmine#1822

Merged

9 tasks

angular-automatic-lock-bot bot locked and limited conversation to collaborators Jul 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(ngcc): allow immediately reporting a stale lock file #37250

perf(ngcc): allow immediately reporting a stale lock file #37250

terencehonles commented May 22, 2020 •

edited

petebacondarwin left a comment

terencehonles commented May 22, 2020

petebacondarwin commented May 22, 2020

terencehonles commented May 22, 2020

gkalpak commented May 22, 2020

terencehonles commented May 22, 2020

terencehonles commented May 22, 2020

terencehonles commented May 22, 2020

petebacondarwin left a comment

petebacondarwin May 30, 2020

terencehonles Jun 1, 2020 •

edited

petebacondarwin Jun 1, 2020

terencehonles Jun 2, 2020

terencehonles Jun 2, 2020

petebacondarwin commented May 31, 2020

JoostK left a comment

gkalpak left a comment

gkalpak Jun 1, 2020

petebacondarwin Jun 1, 2020

terencehonles Jun 1, 2020

gkalpak Jun 2, 2020

terencehonles Jun 2, 2020

petebacondarwin commented Jun 1, 2020

terencehonles commented Jun 1, 2020

petebacondarwin commented Jun 1, 2020

petebacondarwin commented Jun 2, 2020

terencehonles commented Jun 2, 2020 •

edited

terencehonles commented Jun 2, 2020

petebacondarwin commented Jun 2, 2020

angular-automatic-lock-bot bot commented Jul 4, 2020

perf(ngcc): allow immediately reporting a stale lock file #37250

perf(ngcc): allow immediately reporting a stale lock file #37250

Conversation

terencehonles commented May 22, 2020 • edited

PR Checklist

PR Type

What is the current behavior?

What is the new behavior?

Does this PR introduce a breaking change?

petebacondarwin left a comment

Choose a reason for hiding this comment

terencehonles commented May 22, 2020

petebacondarwin commented May 22, 2020

terencehonles commented May 22, 2020

gkalpak commented May 22, 2020

terencehonles commented May 22, 2020

terencehonles commented May 22, 2020

terencehonles commented May 22, 2020

petebacondarwin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

terencehonles Jun 1, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petebacondarwin commented May 31, 2020

JoostK left a comment

Choose a reason for hiding this comment

gkalpak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petebacondarwin commented Jun 1, 2020

terencehonles commented Jun 1, 2020

petebacondarwin commented Jun 1, 2020

petebacondarwin commented Jun 2, 2020

terencehonles commented Jun 2, 2020 • edited

terencehonles commented Jun 2, 2020

petebacondarwin commented Jun 2, 2020

angular-automatic-lock-bot bot commented Jul 4, 2020

terencehonles commented May 22, 2020 •

edited

terencehonles Jun 1, 2020 •

edited

terencehonles commented Jun 2, 2020 •

edited