Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coveralls reporting a decrease overall (using parallel), even though nothing changed #1653

Closed
auto-differentiation-dev opened this issue Jul 26, 2022 · 15 comments

Comments

@auto-differentiation-dev

Hi,

We're struggling to understand why the coverage in a PR decreased in the overall coveralls reporting, even though the detailed numbers below suggest otherwise (and none of the test code was changed). It seems that it's one of the parallel jobs that is reported, not the overall result.

Here is a screenshot for reference:

image

This is the link: https://coveralls.io/builds/51121924
And it corresponds to this PR: auto-differentiation/xad#13

Any ideas?

@afinetooth
Copy link
Collaborator

afinetooth commented Jul 26, 2022

Hi @xcelerit-dev. Thanks for reporting your issue.

This is something that's been happening lately, for a handful of repos as far as we know (example 1 | example 2). Specifically, the behavior is that PRs will show a decrease in coverage that, by the evidence, should not occur. Then, when the jobs in the PR build are re-run, the PR's coverage % corrects itself.

Note these side-by-side comparisons between your base branch (LEFT) and your PR branch (RIGHT).

First, it shows the decrease you found suspect:
Screen Shot 2022-07-26 at 4 13 40 PM

It looks suspect to me for the same reasons, and, in addition, while each individual job has the exact same coverage % compared to the base build, the run details tell a different numbers story:

Base:
Screen Shot 2022-07-26 at 4 21 21 PM

PR:
Screen Shot 2022-07-26 at 4 21 40 PM

A build's coverage % always complies with the numbers in the RUN DETAILS.

After I re-run the jobs in the PR build:
Screen Shot 2022-07-26 at 4 17 24 PM

We now see no change in coverage between base and PR. And the RUN DETAILS align in terms of relevant lines covered (even while hits per line changes a bit but does not affect coverage %).

Unfortunately, we don't know the root cause of this issue. For some users it is consistent, for some occasional. Is it occasional or persistent for you?

In your case, there is one thing I see wrong that I would like to correct to see if it has an impact (which could fix your builds, or otherwise be helpful in determining the root cause of the shared issue):

Your builds are not sending status updates, and this can sometimes be cause by an expired OAuth token (of your repo owner in this case), which can cause other problems like failed API calls to Github, which could potentially have a role in what's happening with the first run of the PR build.

To fix this I would normally reassign repo ownership to another user of the repo, ideally one with Admin access to the repo. But in your case, I see you are the only user, and that you have Admin access.

Unless there is another user you're aware of that we can test with, we'll have to find out why status updates aren't reaching Github with the permissions that are attached to your OAuth token.

Do you have any insight into why that may be?

Please give me a few moments to explore further and I'll get back with my own findings.

BTW, to set expectations, if resolving this issue doesn't fix your PR builds (meaning: stop future PR builds from reporting false coverage drops), then you are squarely in the group of repos having this problem with no current known cause. It means I'll have to add you to the ticket with those other projects and update you with any progress from there.

@afinetooth
Copy link
Collaborator

Ok, it turns out I was wrong. You were not the owner of your repo. You were the only user of the repo, but your repo had no owner. I made you the owner and re-ran your builds, and they started sending status updates. (You can check Github to verify status updates on your last three (3) builds.)

Please proceed and let me know, here, if any of your next PR builds also show an incorrect decrease in coverage %.

I'm not sure if this change will resolve that issue, but it would be great to know. The other cases, BTW, did not have this issue AFAIK.

@auto-differentiation-dev
Copy link
Author

Thank you for all your investigations - that was helpful. However, we see this happening again for the next PR:

While we could modify thresholds etc to make sure we can keep working (get checks to pass), that's not a solution. The coverage badge will also keep fluctuating for no reason.

What do you suggest we can do to get around this? Keep triggering new builds in the PRs until we get a consistent result?

@auto-differentiation-dev
Copy link
Author

We tried a rebuild of the whole workflow, but this only adds more jobs to the same coveralls build number and doesn't change the overall coverage percentage.

@afinetooth
Copy link
Collaborator

Yes, I'm sorry to say, you're squarely in the pattern of this known but currently unresolved issue.

Just to make sure, using this PR build as a test:
https://coveralls.io/builds/51200176

Screen Shot 2022-07-28 at 4 52 31 PM

I re-ran the jobs in the build and see the coverage % change / correct itself:

Screen Shot 2022-07-28 at 4 52 57 PM

Sorry. I'll add you to the issue and share any updates.

@OndraM
Copy link

OndraM commented Sep 8, 2022

Hi, we are affected as well:

  • This build (which is push event build) reports in the header "First build on feature/test-page-constants at 64.607%" and also in the "source files" part it reports consistent 64.61 lib/ coverage.
  • However this build, which is PR build for the same branch and the same code, reports "coverage decreased (-29.0003%) to 35.606%". In the "source files" part here is still 64.61 lib/ coverage with "no change" reported. So these two numbers are for some reason inconsistent.

We are getting this behavior for all pull request builds not. However, push builds reports the coverage OK.

@afinetooth
Copy link
Collaborator

Hi, @OndraM.

Thanks for the report. It sounds like we'll need to add you to this issue. Which I'll do as soon as I'm able to verify and make sure there isn't another underlying cause in your case.

We're experiencing an extremely high volume of support requests this week and are working through a backlog. We'll respond to your issue as soon as we possibly can.

Thanks.

@Torgen
Copy link

Torgen commented Sep 17, 2022

If more data points helps, I think https://github.com/Torgen/codex-blackboard also has this issue. The coveralls site shows the aggregated coverage, but the comment and check result show the lowest single sub-result of the parallel runs.

@afinetooth
Copy link
Collaborator

@xcelerit-dev @OndraM @Torgen

Workaround for this issue: the Rerun Build Webhook

While we've not yet identified a fix for this issue, we released a workaround today that should resolve it for you: the Rerun Build Webhook.

Since the nature of the issue appears to be that, for some repos with parallel builds:

  1. The coverage % of pull_request builds is calculated incorrectly (on the first build), but
  2. Re-running the build (recalculating coverage) corrects the coverage %.

A Rerun Build Webhook, similar to the (Close) Parallel Build Webhook, fixes the issue by triggering your build to re-calculate itself.

Instructions

Call this at the end of your CI config, after calling the (Close) Parallel Build Webhook.

Call it like this:

curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=<YOUR REPO TOKEN>&build_num=<YOUR BUILD NUMBER>'

But substitute your repo_token, and your build_num (the same value you used for build_num in your (Close) Parallel Build Webhook).

Please note a few differences between the Rerun Build Webhook and the (Close) Parallel Build Webhook:

  1. The /rerun_build endpoint will accept a GET or a POST, and
  2. The only two required URL params are your repo_token and the build_num, and build_num is a regular URL param and not part of a JSON body called "payload" as required by the (Close) Parallel Build Webhook:
curl -k https://coveralls.io/webhook?repo_token=<YOUR REPO_TOKEN> -d "payload[build_num]=<YOUR BUILD NUMBER>&payload[status]=done"

@auto-differentiation-dev
Copy link
Author

auto-differentiation-dev commented Sep 20, 2022

@afinetooth Thank you for the description. We'd like to test this, but we're wondering how we can get the build_num within a GitHub workflow yml file? This is our ci.yml, and I guess the workaround would mean inserting the following extra step into the coverage_finish job:

- name: Rerun coverage workaround
  run: |
    curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=XXXX&build_num=YYYY'

The repo token (XXXX) is simple to get, but how would we get the YYYY? We're using the coveralls github action.

@auto-differentiation-dev
Copy link
Author

We worked out the required change. After placing a secret for the COVERALLS_REPO_TOKEN, the following workflow step appears to work fine: https://github.com/xcelerit/XAD/blob/main/.github/workflows/ci.yml#L324-L327

@afinetooth
Copy link
Collaborator

afinetooth commented Sep 20, 2022

@xcelerit-dev apologies. I forgot that some Coveralls integrations (like the Coveralls GitHub Action) have special features around the (Close) Parallel Build Webhook, such that you don't need to build the request.

You got it right. The build_num is ${{ env.GITHUB_RUN_ID }} for the Coveralls GitHub Action.

This came up right away after I posted the solution here as well, so you can see a longer explanation there.

@fabiode
Copy link

fabiode commented Jul 11, 2023

@afinetooth Hi, thanks for the proposed workaround. It works sometimes, but it seems also to fetch random results from one of the parallel runs that had an even lower coverage result as a canonic value for the entire build.

before the workaround we had ±0.3%, now it can reach -38.9% change.

I'm not sure if I did something wrong, but I added the rerun_build request after parallel_build_webhook

- name: Rerun coverage workaround
  run: |
    curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=XXXX&build_num=YYYY'

Replacing XXXX and YYYY by CircleCI env vars.

I was wondering if anyone encountered this oscillation after the workaround.

@afinetooth
Copy link
Collaborator

@fabiode Can you share the URLs for 2-3 recent builds that this is happening for? I feel like recent changes to our code should make this original issue far less likely to begin with. But I'd also like to verify that we received your rerun webhook, etc. If you repo is private or sensitive, please email us at suport@coveralls.io and mention this issue. I will get it.

@afinetooth
Copy link
Collaborator

@fabiode Got your request to support@coveralls.io and will reply there. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants