Fix race condition #195

olivernybroe · 2021-11-12T14:44:06Z

I think I might have found the reason for the race condition.
So looking at the following script https://github.com/actions/cache/blob/main/src/restore.ts it executes an promise, but doesn't return it.

We however require this script and treats it as if it were a promise, but the script doesn't give us a promise back. so we do an await on nothing basically.

To fix this, I chose to use the github action caching api directly, instead of executing the github caching actions script directly. This means we now get a Promise back and we actually have something we can wait on.

I reran this around 20 times on our repository to try and reproduce the error (we got this error like every 5 times) and it seems to be gone now.

I have put it live on our repository and will run with our branch for a week or so before taking this PR out of draft 👍

for people who wanna try out our solution and test it with us, you can use it the following way

      - name: Install Composer dependencies
        uses: worksome/composer-install@fix-race-condition

resolves: #161
resolves: #152

ramsey · 2021-11-12T16:10:03Z

This is great! Thanks!

I've put out a call on Twitter to ask folks who use this to test your changes. https://twitter.com/ramsey/status/1459190635479674884

Lewiscowles1986 · 2021-11-17T08:06:03Z

It reads a lot better and seems like it works.

The logic makes sense; however there is no test case for the await on nothing. No synthetic crash reproduction. So if it works, the breakage is not documented anywhere; just fixed. This will make it harder to re-factor and avoid regression.

I think if https://stackoverflow.com/a/55263084/1548557 is correct, then await will first check if something has a then method. Do you think checking for the presence of this for all cases would help?

Secondly, having a then Clause won't guarantee something returns a response. So you could also check the value of the awaited result, ensuring it is never null or undefined (perhaps using ?? with a known bad value so it's explicit and readable?

What do you think?

olivernybroe · 2021-11-17T09:11:07Z

@Lewiscowles1986 I agree, this could really use a test case for making sure the scenario is not added back in later.

however there is no test case for the await on nothing.

I tried creating it when I wrote my PR, but simply couldn't get the mockery library to do as I wanted so I just said "fuck it".
I simply do not know typescript, and didn't feel like spending more time on this PR. Would appreciate help on this part.

Do you think checking for the presence of this for all cases would help?

Yes, I do think that would be a good idea. However linting should fail if what is given is not a promise. This is why I changed the typehints to be correct and require promises in the typehints. This way linting would complain when non-promises are used here. Do you think that is enough or where you envisioning something more?

So you could also check the value of the awaited result, ensuring it is never null or undefined

Hmm, that might be possible, as the cache functions do have some return values. However we do now have typehints requiring them to return that type. so in theory linting should catch it ;)

Thanks a lot for the feedback and trying to also do the research to figure out if this indeed is a solution

Lewiscowles1986 · 2021-11-20T08:50:36Z

Await on no test would look something like

it('should always return a promise', async () => {
    const result = await Promise.resolve(/* call to method not returning a promise that should */);
    expect(result).toBeTruthy();
});

TypeScript may be a complicating factor in this though. I've hit this a few times where I basically had to move a TypeScript project to JS because valid typescript with all it's supposed protections; prevented me from testing behavior that could be measured after transpiling to JS.

Maybe someone else has a tip I'm unaware of if this doesn't work.

Specifically that should fail for master, but pass for your branch.

olivernybroe · 2021-11-20T18:58:39Z

So I think we can safely say that the solution here is working.
We have had no errors of the race condition in a whole week 👍

However @Lewiscowles1986 does have a valid point that we are missing test for it. However if I have to create the test it will take a while.
So I'll let @ramsey choose here, if they either can get someone else to do the test, do it themselves, or if they will merge it in without any test.

kevinpapst · 2021-11-25T11:51:15Z

This issue causes (only for me) at least once a day a failed build / the need to manually trigger "rebuild all".

As there is no test yet that fails even though there is a problem for at least some users, I'd argue that fixing the issue is more important than waiting for a test case for code which wasn't covered before.
Or in other words: this workflow is more or less "feature complete", so do you really expect that this problem will be reintroduced in the future? I don't see that many changes in the last year and no real need for in the future.

There is another question that I'd like to raise: should the dist files be included in the merge (good for testing) or rather being rebuild after merged by the repo maintainer (the user base trusts this repo)?

ruudk · 2021-11-25T11:56:19Z

I agree, I also stumble on this multiple times per week on different repositories.

olivernybroe · 2021-11-25T13:14:21Z

@kevinpapst Agreed, I also agree that when/if this is being merged by @ramsey, he should rebuild it, so it's not my dist files 👍

ruudk · 2021-12-07T09:20:51Z

@ramsey Would it be possible to merge this? 🙏

kevinpapst · 2021-12-13T23:28:14Z

@ramsey this is open source, it is your right to ignore our problems here. but I want to try it once again: could you please be so kind and share your plans about this topic. I am hitting that re-run all jobs literally every day and this last one was once too often 😁 I just want to know if we have a chance that this will be integrated and released anytime soon or if it is better that we are setting up our own forks of the action?

ramsey · 2021-12-13T23:31:01Z

@kevinpapst I've been waiting to see if @olivernybroe or anyone else would provide a test. You're welcome to run the action from your own fork.

kevinpapst · 2021-12-14T01:26:44Z

Thanks for your feedback @ramsey, and thanks for your action. It solves a problem for a certain target group in a neat way.
But as parallel builds are a problem and all my projects use matrix build I cannot wait any longer for a fix that exist but does not get merged.

For anyone interested, you can use the cache action combined with the speed improvements from Composer 2 to get the same build times (I could not see any speed difference to this action here).

All you have to do is to replace:

            -    name: Install dependencies
                 uses: ramsey/composer-install@v1

with

            -   name: Determine composer cache directory
                id: composer-cache
                run: "echo \"::set-output name=directory::$(composer config cache-dir)\""

            -   name: Cache Composer dependencies
                uses: actions/cache@v2
                with:
                    path: "${{ steps.composer-cache.outputs.directory }}"
                    key: ${{ runner.os }}-${{ matrix.php }}-${{ hashFiles('**/composer.lock') }}

            -   name: Install dependencies
                run: composer install

Lewiscowles1986 · 2021-12-16T14:20:54Z

@ramsey I've not been able to recreate a race in local tests. This is my key reason for not attempting a test.

While I understand this might be frustrating, I think it's a wise choice to hold-off until we can make a reproductive case test, and then it can be confirmed why the flakes, and how they were fixed.

ramsey · 2021-12-23T19:47:59Z

GitHub now supports uses on steps in a composite action. They did not support this when I began developing this action, which is why I had to develop it as a JavaScript action.

Since I'm now able to develop this as a composite action, I've shifted away from JavaScript and have converted this to a composite action in v2.

As a side note: when I first began developing this, GitHub's cache action also was not a separate library intended for use by other actions. They've since changed this, as well, so if I couldn't switch to a composite action, I would have updated to the latest version of their library, which might have fixed this race condition, among other issues.

Now that I'm using GitHub's cache action directly as part of a composite action, I think any of the race conditions and inability to read cache files that folks were experiencing (due to my bad JavaScript code) should be cleared up.

Please upgrade using ramsey/composer-install@v2 or ramsey/composer-install@2.0.0. Details are available on the release announcement and the README.

Thanks for your PR and the good discussion here that encouraged me to make this change!

olivernybroe · 2021-12-23T20:01:33Z

@ramsey amazing! Looking forward to changing to version 2 🎉

Thanks for the work to change this to a composite action!

Changin caching to use official api instead

621be4b

olivernybroe marked this pull request as draft November 12, 2021 14:44

olivernybroe marked this pull request as ready for review November 25, 2021 13:14

ruudk mentioned this pull request Dec 14, 2021

Unable to reserve cache with key X another job may be creating this cache shivammathur/cache-extensions#22

Closed

ramsey closed this Dec 23, 2021

owenvoke deleted the fix-race-condition branch March 12, 2024 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition #195

Fix race condition #195

olivernybroe commented Nov 12, 2021 •

edited

ramsey commented Nov 12, 2021

Lewiscowles1986 commented Nov 17, 2021

olivernybroe commented Nov 17, 2021 •

edited

Lewiscowles1986 commented Nov 20, 2021 •

edited

olivernybroe commented Nov 20, 2021

kevinpapst commented Nov 25, 2021

ruudk commented Nov 25, 2021

olivernybroe commented Nov 25, 2021

ruudk commented Dec 7, 2021

kevinpapst commented Dec 13, 2021

ramsey commented Dec 13, 2021

kevinpapst commented Dec 14, 2021

Lewiscowles1986 commented Dec 16, 2021

ramsey commented Dec 23, 2021

olivernybroe commented Dec 23, 2021

Fix race condition #195

Fix race condition #195

Conversation

olivernybroe commented Nov 12, 2021 • edited

ramsey commented Nov 12, 2021

Lewiscowles1986 commented Nov 17, 2021

olivernybroe commented Nov 17, 2021 • edited

Lewiscowles1986 commented Nov 20, 2021 • edited

olivernybroe commented Nov 20, 2021

kevinpapst commented Nov 25, 2021

ruudk commented Nov 25, 2021

olivernybroe commented Nov 25, 2021

ruudk commented Dec 7, 2021

kevinpapst commented Dec 13, 2021

ramsey commented Dec 13, 2021

kevinpapst commented Dec 14, 2021

Lewiscowles1986 commented Dec 16, 2021

ramsey commented Dec 23, 2021

olivernybroe commented Dec 23, 2021

olivernybroe commented Nov 12, 2021 •

edited

olivernybroe commented Nov 17, 2021 •

edited

Lewiscowles1986 commented Nov 20, 2021 •

edited