Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ngcc): allow immediately reporting a stale lock file #37250

Conversation

terencehonles
Copy link
Contributor

@terencehonles terencehonles commented May 22, 2020

Currently, if an ngcc process is killed in a manner that it doesn't clean up its lock file (or is killed too quickly) the compiler reports that it is waiting on the PID of a process that doesn't exist, and that it will wait up to a maximum of N seconds. This PR updates the locking code to additionally check if the process exists, and if it does not it will immediately bail out, and print the location of the lock file so a user may clean it up.

PR Checklist

Please check if your PR fulfills the following requirements:

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • angular.io application / infrastructure changes
  • Other... Please describe:

What is the current behavior?

ngcc will wait on a lock file that will never be resolved until the maximum timeout is reached, and only then will it tell the user the path to delete the lock file.

Issue Number: N/A

What is the new behavior?

ngcc will exit immediately, and report the path to the lock file, and let the user decide if it should be deleted.

Does this PR introduce a breaking change?

  • Yes
  • No

@pullapprove pullapprove bot requested a review from alxhub May 22, 2020 04:35
Copy link
Member

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @terencehonles - thanks for sending in this PR. It is a great idea.
Given the SyncLocker already bails out immediately when the lockfile is not free then this is actually only a concern of the AsyncLocker. So I think we should move the logic there, where we are already doing a catch of the error and checking for EEXIST.

Also I would be keen to make the check for whether the referenced process is still running to a separate method of AsyncLocker:

private isRunning(pid: string): boolean {
  try {
    return process.kill(+pid, 0);
  } catch (e) {
    // if there is a problem checking if the process is running throw the original error.
    if (e.code !== 'ESRCH') {
      throw e;
    }
  }
  return false;
}

@petebacondarwin petebacondarwin added comp: ngcc effort1: hours freq2: medium help wanted An issue that is suitable for a community contributor (based on its complexity/scope). action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews target: patch This PR is targeted for the next patch release risk: low feature Issue that requests a new feature labels May 22, 2020
@ngbot ngbot bot added this to the Backlog milestone May 22, 2020
@terencehonles
Copy link
Contributor Author

@petebacondarwin sounds good! I was contemplating breaking it out to a function, but was avoiding changing the surface of the LockFile interface. I had noticed the sync locker didn't retry, but I figured that might be coming from somewhere else. I guess not, and I realize now, that I ran into this because I was building an angular project "experimentally" instead of running ngcc like our normal build tooling does.

The other reason I put it in the LockFile implementation is that implementation is the one reading/writing the lock file and describing how the lock is actually implemented including its contents. I was thinking there could be a case where the lock is being requested over the network or in some space that the PID wouldn't be found anymore, but still be valid. I'm probably overthinking things, and the lockers both do know enough about the internal implementation that I can see that the better process boundary would be at the locker's .lock(...) method. I still think the actual contents of the lock file is more "owned" by the LockFile itself, but I don't mind moving the code.

@petebacondarwin
Copy link
Member

I agree that the underlying implementation of the lockfile are leaking into the lockers already (e.g. the EEXIST error.

The main reason for abstracting it out was to allow the two different lockers, rather than having multiple different lock files. If we decided to go that way then I think we would need to move more into the lockfile and out of the lockers anyway.

@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 4794839 to 76ee8e0 Compare May 22, 2020 17:51
@terencehonles
Copy link
Contributor Author

@petebacondarwin yup, anyways I refactored the PR. I noticed a lint error from before and figured out how to run the formatter, and I also noticed the build failed because process.kill is not typed to return boolean. I believe the cast I did and the changes should be satisfactory, but I'll poke at the tests when they run and hopefully they just pass 🙂

@gkalpak
Copy link
Member

gkalpak commented May 22, 2020

Probably stating the obvious (but just in case): We need to ensure that this works on Linux, macOS and Windows.

@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 76ee8e0 to 4cab9e9 Compare May 22, 2020 19:03
@terencehonles
Copy link
Contributor Author

Probably stating the obvious (but just in case): We need to ensure that this works on Linux, macOS and Windows.

Yup, I'm on Linux and macOS should be the same. From what I can tell it does seem like node emulates kill(pid, 0) for Windows, but I don't have a machine to easily test against.

@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch 3 times, most recently from ab0808f to 4072f36 Compare May 22, 2020 22:53
@terencehonles
Copy link
Contributor Author

Ok, all the tests should be passing at this point.

@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 4072f36 to e7af3af Compare May 22, 2020 23:36
@terencehonles
Copy link
Contributor Author

Ok, all the tests should be passing at this point.

GitHub seems to be having issues (I believe some of the tests failed to run, not the tests failed). Trying to shake things free.

Copy link
Member

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of NITs - but this looks good. We need to check that it works on Windows...

packages/compiler-cli/ngcc/src/locking/async_locker.ts Outdated Show resolved Hide resolved
packages/compiler-cli/ngcc/src/locking/async_locker.ts Outdated Show resolved Hide resolved
}
return lockFileContents;
});
spyOn(process, 'kill').and.callFake(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I used initially, but it wasted a bunch of my time 😒. The .andThrowError function takes a string or Error object, and since {code: 'ESRCH'} is neither it will convert it to a string ('[object Object]' 😒 ) which then will easily confuse you on why the test is failing, but it really should be working.

...and yes, I ended up having to looking for the source code for Jasmine, which annoyingly their documentation doesn't actually seem to point out the signatures for their test functions.

If you would like me to change the code I can, but it would probably need to be something like .andThrowError(Object.assign(new Error(), {code: 'ESRCH'}))

Sadly JavaScript still has some way to go till it is as nice as writing in a more mature language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh that is a pain... Happy to leave this as-is unless you wanted to make a helper function:

function createNodeError(code: string, message?: string) {
  const e = new Error(message);
  e.code = code;
  return e;
}

And then do something like:

.andThrowError(createNodeError('ESRCH'));

(I haven't checked whether that works!)

Happy either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly doubt the helper function would be reused 😕 so I think it may be better to just leave it as is.

I may look at submitting a patch to Jasmine so that if it's not strictly a string it doesn't try to construct an Error around the value passed in. That way this code could be changed in the future, or other people who are trying to do the same thing aren't bitten by the same problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created PR jasmine/jasmine#1822 so hopefully they like the suggestion.

@petebacondarwin
Copy link
Member

I tried this out on my Windows box and it appears to work! 🎉

@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch 2 times, most recently from e6b104f to c8e12eb Compare June 1, 2020 01:52
@petebacondarwin petebacondarwin removed the request for review from alxhub June 1, 2020 08:52
Copy link
Member

@JoostK JoostK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being pedantic: this is marked as "feat" but targeting patch (which is 10.RC at the moment), so perhaps "perf" would be more appropriate.

Copy link
Member

@gkalpak gkalpak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of optional nits. LGTM otherwise ✨

}

throw new TimeoutError(this.lockFileMessage(
`Lock found, but no process with PID ${pid} seems to be running.`));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, it is still possible to have a race condition resulting in an inaccurate message:

  • Say pid: 1 is running so lockFile.write() fails at the top of the for loop.
  • We read the PID from the lock file and set the pid variable to 1.
  • Since attempts is 0, we call isProcessRunning(1).
  • Now, let's say that the process has ended between the last time we read the lock-file to get the PID and us calling isProcessRunning(1).
  • In this case, it is (theoretically) possible that another process has started and created a lock-file. Thus, the lockFile.write() call right after isProcessRunning(1) will fail (but not because the original PID (1) is still in the lock-file, as the error message will imply).

I admit that the likelihood of this race condition happening is too low (and could potentially be ignored), but we could avoid it by doing an extra check that pid === this.lockFile.read().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't even do pid === this.lockFile.read(), since this could also fall foul of a race condition. (Even more unlikely).
Perhaps the best option is to change the error message to accept that it might be that this is the case and still bail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I contemplated doing another read for a PID change, but I figured it was rather unlikely, and the point of this PR was more to bail out and give a helpful message as to what might be happening. I can do the last ditch read if that might be useful, but just to be clear the complete message is the following:

Lock found, but no process with PID ${pid} seems to be running.
(If you are sure no ngcc process is running then you should delete the lock-file at ${this.lockFile.path}.)

and I felt that it was probably clear enough that there is a chance this was not correct (and why I didn't have ngcc automatically try to clean up the lock file).

I'm fine either way, but I'd probably need a suggestion on what to change the message to if that's the direction we want to take things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't even do pid === this.lockFile.read(), since this could also fall foul of a race condition. (Even more unlikely).

I would still prefer the less likely race condition 😁
But I don't feel too strongly about it, given the likelyhood is quite low anyway. Let's leave it as is (I doubt anyone will run into it often enough to be bothered 😛).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I was rebasing the change, I added a final .read() to check the PID remained the same.

@petebacondarwin
Copy link
Member

Being pedantic: this is marked as "feat" but targeting patch (which is 10.RC at the moment), so perhaps "perf" would be more appropriate.

This makes sense. @terencehonles can you change this?

@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from c8e12eb to b93b979 Compare June 1, 2020 21:29
@terencehonles terencehonles changed the title feat(ngcc): allow immediately reporting a stale lock file perf(ngcc): allow immediately reporting a stale lock file Jun 1, 2020
@terencehonles
Copy link
Contributor Author

Being pedantic: this is marked as "feat" but targeting patch (which is 10.RC at the moment), so perhaps "perf" would be more appropriate.

This makes sense. @terencehonles can you change this?

Done, I'll let you or @gkalpak chime in on the remaining comments.

@petebacondarwin
Copy link
Member

If @gkalpak doesn't have any more suggestions by tomorrow night, I'll mark this for merge.

@petebacondarwin petebacondarwin added action: merge The PR is ready for merge by the caretaker and removed action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews labels Jun 2, 2020
@petebacondarwin
Copy link
Member

@terencehonles - the CI failure appears to be related to a built files size regression in one of integration tests. I don't think this PR should affect that. Can you try rebasing on master?

@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch 2 times, most recently from 29a66ea to 3d35d88 Compare June 2, 2020 17:13
@terencehonles
Copy link
Contributor Author

terencehonles commented Jun 2, 2020

@petebacondarwin done, and almost forgot to push my change for #37250 (comment) 😅

Currently, if an ngcc process is killed in a manner that it doesn't clean
up its lock file (or is killed too quickly) the compiler reports that it
is waiting on the PID of a process that doesn't exist, and that it will
wait up to a maximum of N seconds. This PR updates the locking code to
additionally check if the process exists, and if it does not it will
immediately bail out, and print the location of the lock file so a user
may clean it up.
@terencehonles terencehonles force-pushed the ngcc-immediately-report-stale-lock-file branch from 3d35d88 to f8b6c2a Compare June 2, 2020 17:54
@terencehonles
Copy link
Contributor Author

Looks like I probably needed 1b55da1 to fix the size limits

@petebacondarwin
Copy link
Member

Good work @terencehonles - CI is green! We are ready to merge.

matsko pushed a commit that referenced this pull request Jun 2, 2020
Currently, if an ngcc process is killed in a manner that it doesn't clean
up its lock file (or is killed too quickly) the compiler reports that it
is waiting on the PID of a process that doesn't exist, and that it will
wait up to a maximum of N seconds. This PR updates the locking code to
additionally check if the process exists, and if it does not it will
immediately bail out, and print the location of the lock file so a user
may clean it up.

PR Close #37250
@matsko matsko closed this in 561c0f8 Jun 2, 2020
@terencehonles terencehonles deleted the ngcc-immediately-report-stale-lock-file branch June 3, 2020 18:13
ngwattcos pushed a commit to ngwattcos/angular that referenced this pull request Jun 25, 2020
)

Currently, if an ngcc process is killed in a manner that it doesn't clean
up its lock file (or is killed too quickly) the compiler reports that it
is waiting on the PID of a process that doesn't exist, and that it will
wait up to a maximum of N seconds. This PR updates the locking code to
additionally check if the process exists, and if it does not it will
immediately bail out, and print the location of the lock file so a user
may clean it up.

PR Close angular#37250
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Jul 4, 2020
profanis pushed a commit to profanis/angular that referenced this pull request Sep 5, 2020
)

Currently, if an ngcc process is killed in a manner that it doesn't clean
up its lock file (or is killed too quickly) the compiler reports that it
is waiting on the PID of a process that doesn't exist, and that it will
wait up to a maximum of N seconds. This PR updates the locking code to
additionally check if the process exists, and if it does not it will
immediately bail out, and print the location of the lock file so a user
may clean it up.

PR Close angular#37250
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker cla: yes effort1: hours feature Issue that requests a new feature freq2: medium help wanted An issue that is suitable for a community contributor (based on its complexity/scope). risk: low target: patch This PR is targeted for the next patch release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants