Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition #195

Closed
wants to merge 1 commit into from
Closed

Conversation

olivernybroe
Copy link

@olivernybroe olivernybroe commented Nov 12, 2021

I think I might have found the reason for the race condition.
So looking at the following script https://github.com/actions/cache/blob/main/src/restore.ts it executes an promise, but doesn't return it.

We however require this script and treats it as if it were a promise, but the script doesn't give us a promise back. so we do an await on nothing basically.

To fix this, I chose to use the github action caching api directly, instead of executing the github caching actions script directly. This means we now get a Promise back and we actually have something we can wait on.

I reran this around 20 times on our repository to try and reproduce the error (we got this error like every 5 times) and it seems to be gone now.

I have put it live on our repository and will run with our branch for a week or so before taking this PR out of draft 馃憤

for people who wanna try out our solution and test it with us, you can use it the following way

      - name: Install Composer dependencies
        uses: worksome/composer-install@fix-race-condition

resolves: #161
resolves: #152

@olivernybroe olivernybroe marked this pull request as draft November 12, 2021 14:44
@ramsey
Copy link
Owner

ramsey commented Nov 12, 2021

This is great! Thanks!

I've put out a call on Twitter to ask folks who use this to test your changes. https://twitter.com/ramsey/status/1459190635479674884

@Lewiscowles1986
Copy link

It reads a lot better and seems like it works.

The logic makes sense; however there is no test case for the await on nothing. No synthetic crash reproduction. So if it works, the breakage is not documented anywhere; just fixed. This will make it harder to re-factor and avoid regression.

I think if https://stackoverflow.com/a/55263084/1548557 is correct, then await will first check if something has a then method. Do you think checking for the presence of this for all cases would help?

Secondly, having a then Clause won't guarantee something returns a response. So you could also check the value of the awaited result, ensuring it is never null or undefined (perhaps using ?? with a known bad value so it's explicit and readable?

What do you think?

@olivernybroe
Copy link
Author

olivernybroe commented Nov 17, 2021

@Lewiscowles1986 I agree, this could really use a test case for making sure the scenario is not added back in later.

however there is no test case for the await on nothing.

I tried creating it when I wrote my PR, but simply couldn't get the mockery library to do as I wanted so I just said "fuck it".
I simply do not know typescript, and didn't feel like spending more time on this PR. Would appreciate help on this part.

Do you think checking for the presence of this for all cases would help?

Yes, I do think that would be a good idea. However linting should fail if what is given is not a promise. This is why I changed the typehints to be correct and require promises in the typehints. This way linting would complain when non-promises are used here. Do you think that is enough or where you envisioning something more?

So you could also check the value of the awaited result, ensuring it is never null or undefined

Hmm, that might be possible, as the cache functions do have some return values. However we do now have typehints requiring them to return that type. so in theory linting should catch it ;)

Thanks a lot for the feedback and trying to also do the research to figure out if this indeed is a solution

@Lewiscowles1986
Copy link

Lewiscowles1986 commented Nov 20, 2021

Await on no test would look something like

it('should always return a promise', async () => {
    const result = await Promise.resolve(/* call to method not returning a promise that should */);
    expect(result).toBeTruthy();
});

TypeScript may be a complicating factor in this though. I've hit this a few times where I basically had to move a TypeScript project to JS because valid typescript with all it's supposed protections; prevented me from testing behavior that could be measured after transpiling to JS.

Maybe someone else has a tip I'm unaware of if this doesn't work.

Specifically that should fail for master, but pass for your branch.

@olivernybroe
Copy link
Author

So I think we can safely say that the solution here is working.
We have had no errors of the race condition in a whole week 馃憤

However @Lewiscowles1986 does have a valid point that we are missing test for it. However if I have to create the test it will take a while.
So I'll let @ramsey choose here, if they either can get someone else to do the test, do it themselves, or if they will merge it in without any test.

@kevinpapst
Copy link

This issue causes (only for me) at least once a day a failed build / the need to manually trigger "rebuild all".

As there is no test yet that fails even though there is a problem for at least some users, I'd argue that fixing the issue is more important than waiting for a test case for code which wasn't covered before.
Or in other words: this workflow is more or less "feature complete", so do you really expect that this problem will be reintroduced in the future? I don't see that many changes in the last year and no real need for in the future.

There is another question that I'd like to raise: should the dist files be included in the merge (good for testing) or rather being rebuild after merged by the repo maintainer (the user base trusts this repo)?

@ruudk
Copy link

ruudk commented Nov 25, 2021

I agree, I also stumble on this multiple times per week on different repositories.

@olivernybroe
Copy link
Author

@kevinpapst Agreed, I also agree that when/if this is being merged by @ramsey, he should rebuild it, so it's not my dist files 馃憤

@olivernybroe olivernybroe marked this pull request as ready for review November 25, 2021 13:14
@ruudk
Copy link

ruudk commented Dec 7, 2021

@ramsey Would it be possible to merge this? 馃檹

@kevinpapst
Copy link

@ramsey this is open source, it is your right to ignore our problems here. but I want to try it once again: could you please be so kind and share your plans about this topic. I am hitting that re-run all jobs literally every day and this last one was once too often 馃榿 I just want to know if we have a chance that this will be integrated and released anytime soon or if it is better that we are setting up our own forks of the action?

@ramsey
Copy link
Owner

ramsey commented Dec 13, 2021

@kevinpapst I've been waiting to see if @olivernybroe or anyone else would provide a test. You're welcome to run the action from your own fork.

@kevinpapst
Copy link

Thanks for your feedback @ramsey, and thanks for your action. It solves a problem for a certain target group in a neat way.
But as parallel builds are a problem and all my projects use matrix build I cannot wait any longer for a fix that exist but does not get merged.

For anyone interested, you can use the cache action combined with the speed improvements from Composer 2 to get the same build times (I could not see any speed difference to this action here).

All you have to do is to replace:

            -    name: Install dependencies
                 uses: ramsey/composer-install@v1

with

            -   name: Determine composer cache directory
                id: composer-cache
                run: "echo \"::set-output name=directory::$(composer config cache-dir)\""

            -   name: Cache Composer dependencies
                uses: actions/cache@v2
                with:
                    path: "${{ steps.composer-cache.outputs.directory }}"
                    key: ${{ runner.os }}-${{ matrix.php }}-${{ hashFiles('**/composer.lock') }}

            -   name: Install dependencies
                run: composer install

@Lewiscowles1986
Copy link

@ramsey I've not been able to recreate a race in local tests. This is my key reason for not attempting a test.

While I understand this might be frustrating, I think it's a wise choice to hold-off until we can make a reproductive case test, and then it can be confirmed why the flakes, and how they were fixed.

@ramsey
Copy link
Owner

ramsey commented Dec 23, 2021

GitHub now supports uses on steps in a composite action. They did not support this when I began developing this action, which is why I had to develop it as a JavaScript action.

Since I'm now able to develop this as a composite action, I've shifted away from JavaScript and have converted this to a composite action in v2.

As a side note: when I first began developing this, GitHub's cache action also was not a separate library intended for use by other actions. They've since changed this, as well, so if I couldn't switch to a composite action, I would have updated to the latest version of their library, which might have fixed this race condition, among other issues.

Now that I'm using GitHub's cache action directly as part of a composite action, I think any of the race conditions and inability to read cache files that folks were experiencing (due to my bad JavaScript code) should be cleared up.

Please upgrade using ramsey/composer-install@v2 or ramsey/composer-install@2.0.0. Details are available on the release announcement and the README.

Thanks for your PR and the good discussion here that encouraged me to make this change!

@ramsey ramsey closed this Dec 23, 2021
@olivernybroe
Copy link
Author

@ramsey amazing! Looking forward to changing to version 2 馃帀

Thanks for the work to change this to a composite action!

@owenvoke owenvoke deleted the fix-race-condition branch March 12, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants