Concurrent git commands on the same repo fail #211

mattwynne · 2020-09-10T22:17:59Z

Related to #210, if you run a manual fetch immediately after a connect, you'll see a failure because the background fetch triggered by the domain rule seems to be clashing with the manual fetch.

The actual error I see is this:

[server]   <-- POST /repos/smoke-test-423955FC-68A3-4A72-B4ED-B9FCBEED12D8
[server] (node:7496) UnhandledPromiseRejectionWarning: Error: Git command `config gc.pruneExpire never` returned exit code 255:
[server] error: could not lock config file config: File exists
[server]
[server]     at GitDirectory.exec (/Users/matt/projects/git-en-boite/packages/local-git/src/git_directory.ts:27:13)
[server]     at processTicksAndRejections (internal/process/task_queues.js:97:5)
[server]     at exports.handleInit (/Users/matt/projects/git-en-boite/packages/local-git/src/handlers/handleInit.ts:9:3)
[server]     at RepoFactory.open (/Users/matt/projects/git-en-boite/packages/local-git/src/repo_factory.ts:49:5)
[server]     at Function.openGitRepo (/Users/matt/projects/git-en-boite/packages/local-git/src/dugite_git_repo.ts:17:22)
[server]     at BackgroundGitRepos.openGitRepo (/Users/matt/projects/git-en-boite/packages/local-git/src/background_git_repos.ts:64:21)
[server]     at DiskRepoIndex.find (/Users/matt/projects/git-en-boite/packages/repo-index/src/disk_repo_index.ts:20:29)
[server]     at LaBoîte.fetchFromRemote (/Users/matt/projects/git-en-boite/packages/app/src/la_boîte.ts:43:18)

So this looks like at the moment it's just caused the config commands we run in handleInit which we're blindly running each time you open a repo folder. We can probably avoid this, but there might be other problems with the other git commands when we get past this. Needs more investigation.

The text was updated successfully, but these errors were encountered:

Use it in the appropriate places in the tests

stale · 2020-10-28T22:20:51Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your contributions 🙏.

mattwynne · 2021-02-25T15:06:56Z

We saw this again today when testing #545

mattwynne · 2021-03-04T19:33:16Z

OK I've caught this today with a new acceptance test. You can reproduce it with:

show_logs=1 yarn acceptance test:wip

Here's the error we see:

fatal: Unable to create '/private/var/folders/l9/95tdbmtd3_s0jh3m9hjt6sk80000gn/T/tmp-84916-93ii628SIl64/7265706f2d38323034333935352d383137302d343661302d613234302d316536353434313064626237/shallow.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.

at GitDirectory.exec (/Users/matt/projects/git-en-boite/packages/local-clones/src/git_directory.ts:24:30)

mattwynne · 2021-03-04T19:34:19Z

It's interesting that it's using shallow.lock in this instance. Most of the stuff I can find about git's lockfiles talks about index.lock though even that I can't find much documentation about.

Relates-to #211 It's debatable whether this deserves to be surfaced in an acceptance test, but I'm not sure yet which layer is the right place to solve this, so it makes sense to put it in our full-stack tests in any case. And it's probably useful to document this capabilty.

mattwynne · 2021-03-05T15:45:25Z

Moved this into a branch: 211-support-concurrent-git-commands so we don't have too many WIP scenarios in the main branch.

mattwynne · 2021-03-05T19:54:11Z

I've resolved this for now by returning a 503 error with a 60-second retry-after header. This is better than a dumb 500 error, but much quicker than trying to handle the concurrency issue properly. I think we should monitor it and see how much of a problem it becomes.

We could also do more proactive exploratory testing on it. I can imagine that there could be other errors coming from git than the ones it currently handles if we try doing several things at the same time, such as making a commit during a fetch, or fetching during a push.

mattwynne · 2021-03-05T20:01:00Z

I've had a think about the options for resolving this "properly". Largely, this comes down to choices about which layer we want to handle this problem at. Is it something for the local-clones git adapter to handle, or should the domain be aware that a repo could be busy? Or should the application itself be responsible for queuing commands for each repo so only one can happen at the same time.

One thing I'm pretty sure of is that we should avoid trying to solve this by bending the bullmq infrastructure to use one queue per repo. We tried this before (see ADR 0011) and it didn't work out well.

My hunch is that the problem could be solved by domain events, once we've implemented #242 so that any node can see all the events.

Ref: #211 Co-authored by: Romain Gérard <romaingweb@gmail.com>

mattwynne · 2021-03-15T20:36:28Z

I wonder whether this is another instance of this same problem:

InvalidRepoUrl: warning: unable to access './config': Stale file handle
fatal: unable to access 'config': Stale file handle

    at handleRemoteError (/app/packages/local-clones/dist/handlers/handleValidateRemote.js:9:11)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async handleValidateRemote (/app/packages/local-clones/dist/handlers/handleValidateRemote.js:12:5)
    at async handleConnect (/app/packages/local-clones/dist/handlers/handleConnect.js:6:5)
    at async Worker.processJob [as processFn] (/app/packages/local-clones/dist/background_worker_local_clones.js:123:28)
    at async Worker.processJob (/app/node_modules/bullmq/dist/classes/worker.js:234:28)
    at async Worker.run (/app/node_modules/bullmq/dist/classes/worker.js:99:34)

mattwynne · 2021-03-17T15:07:13Z

We're doing #572 first, which should cut down a lot of the problems by only doing fetches when absolutely neccesary.

mattwynne · 2021-03-18T03:54:34Z

I have been thinking about this today, inspired by the GitHub Desktop blog post shared by @romaingweb on Slack.

If we want to get an exclusive lock on a Repo before working with it, I think the right place to do that would be in the InventoryOfRepos which is where we get hold of a Repo domain model instance before calling methods to do work through our local clones.

We can change this interface to offer either an ExclusiveRepo or a ConcurrentRepo depending on which method you call, and put the different methods on Repo into either interface, depending on whether we believe the operation in question needs to be done exclusively or not. For calls to the ExclusiveRepo we can drop a lock file on disk until the transaction has completed, and any other calls to ask for an ExclusiveRepo can block until that lock file has been removed.

stale · 2021-04-17T04:20:06Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your contributions 🙏.

mattwynne · 2021-04-26T16:41:32Z

We could use something like https://www.npmjs.com/package/redis-semaphore to hold a mutex on the repo while working on it.

cbohiptest · 2021-05-04T12:27:55Z

The concurrency problems does not happen anymore since only one container is used for workers.
With the deployment on rancher the number of container has been reduced to 1 that prevents parallel call on the same repo.

The problem will need to be resolved when more containers will be needed.

An ADR has been written in 382a2d0.

Co-authored-by: Julien Biezemans <jb@jbpros.com> Co-authored-by: cbohiptest <celine.bon@hiptest.net>

mattwynne changed the title ~~Concurrent fetch commands fail~~ Concurrent git commands fail Sep 11, 2020

mattwynne changed the title ~~Concurrent git commands fail~~ Concurrent git commands on the same repo fail Sep 11, 2020

mattwynne added a commit that referenced this issue Sep 28, 2020

Simplify code around dispatching messages to repos #211

c84578d

mattwynne added a commit that referenced this issue Sep 28, 2020

Only init a repo directory when it doesn't already exist #211

0f435f5

mattwynne changed the title ~~Concurrent git commands on the same repo fail~~ Concurrent git init commands on the same repo fail Sep 28, 2020

mattwynne added a commit that referenced this issue Sep 28, 2020

Move actual repo create/init down a level #211

330da83

Use it in the appropriate places in the tests

mattwynne changed the title ~~Concurrent git init commands on the same repo fail~~ Concurrent git commands on the same repo fail Sep 28, 2020

stale bot added the wontfix label Oct 28, 2020

stale bot closed this as completed Nov 5, 2020

mattwynne added the debt Improvement of software/operational architecture (e.g. infrastructure, automation, refactoring) label Nov 11, 2020

mattwynne reopened this Feb 25, 2021

mattwynne mentioned this issue Mar 3, 2021

Only emit repo.connected when the repo is connected for the first time #572

Closed

mattwynne added a commit that referenced this issue Mar 5, 2021

Remove WIP scenario for #211 - it's now in a branch

1c29701

mattwynne mentioned this issue Mar 5, 2021

Support concurrent git commands #578

Merged

mattwynne closed this as completed in #578 Mar 5, 2021

mattwynne reopened this Mar 5, 2021

mattwynne added a commit that referenced this issue Mar 8, 2021

Change 503 to 429 when concurrent requests fail

6b10466

Ref: #211 Co-authored by: Romain Gérard <romaingweb@gmail.com>

stale bot added the wontfix Things we've decided not to do label Apr 17, 2021

stale bot closed this as completed Apr 24, 2021

mattwynne reopened this Apr 26, 2021

stale bot removed the wontfix Things we've decided not to do label Apr 26, 2021

mattwynne added pinned wontfix Things we've decided not to do labels Apr 26, 2021

stale bot removed the wontfix Things we've decided not to do label Apr 26, 2021

mattwynne referenced this issue May 5, 2021

Write ADR about handling concurrent git operations

382a2d0

Co-authored-by: Julien Biezemans <jb@jbpros.com> Co-authored-by: cbohiptest <celine.bon@hiptest.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent git commands on the same repo fail #211

Concurrent git commands on the same repo fail #211

mattwynne commented Sep 10, 2020

stale bot commented Oct 28, 2020

mattwynne commented Feb 25, 2021

mattwynne commented Mar 4, 2021 •

edited

mattwynne commented Mar 4, 2021

mattwynne commented Mar 5, 2021

mattwynne commented Mar 5, 2021

mattwynne commented Mar 5, 2021 •

edited

mattwynne commented Mar 15, 2021

mattwynne commented Mar 17, 2021

mattwynne commented Mar 18, 2021

stale bot commented Apr 17, 2021

mattwynne commented Apr 26, 2021

cbohiptest commented May 4, 2021

Concurrent git commands on the same repo fail #211

Concurrent git commands on the same repo fail #211

Comments

mattwynne commented Sep 10, 2020

stale bot commented Oct 28, 2020

mattwynne commented Feb 25, 2021

mattwynne commented Mar 4, 2021 • edited

mattwynne commented Mar 4, 2021

mattwynne commented Mar 5, 2021

mattwynne commented Mar 5, 2021

mattwynne commented Mar 5, 2021 • edited

mattwynne commented Mar 15, 2021

mattwynne commented Mar 17, 2021

mattwynne commented Mar 18, 2021

stale bot commented Apr 17, 2021

mattwynne commented Apr 26, 2021

cbohiptest commented May 4, 2021

mattwynne commented Mar 4, 2021 •

edited

mattwynne commented Mar 5, 2021 •

edited