Coveralls reporting a decrease overall (using parallel), even though nothing changed #1653

auto-differentiation-dev · 2022-07-26T06:39:35Z

Hi,

We're struggling to understand why the coverage in a PR decreased in the overall coveralls reporting, even though the detailed numbers below suggest otherwise (and none of the test code was changed). It seems that it's one of the parallel jobs that is reported, not the overall result.

Here is a screenshot for reference:

This is the link: https://coveralls.io/builds/51121924
And it corresponds to this PR: auto-differentiation/xad#13

Any ideas?

afinetooth · 2022-07-26T23:34:05Z

Hi @xcelerit-dev. Thanks for reporting your issue.

This is something that's been happening lately, for a handful of repos as far as we know (example 1 | example 2). Specifically, the behavior is that PRs will show a decrease in coverage that, by the evidence, should not occur. Then, when the jobs in the PR build are re-run, the PR's coverage % corrects itself.

Note these side-by-side comparisons between your base branch (LEFT) and your PR branch (RIGHT).

First, it shows the decrease you found suspect:

It looks suspect to me for the same reasons, and, in addition, while each individual job has the exact same coverage % compared to the base build, the run details tell a different numbers story:

Base:

PR:

A build's coverage % always complies with the numbers in the RUN DETAILS.

After I re-run the jobs in the PR build:

We now see no change in coverage between base and PR. And the RUN DETAILS align in terms of relevant lines covered (even while hits per line changes a bit but does not affect coverage %).

Unfortunately, we don't know the root cause of this issue. For some users it is consistent, for some occasional. Is it occasional or persistent for you?

In your case, there is one thing I see wrong that I would like to correct to see if it has an impact (which could fix your builds, or otherwise be helpful in determining the root cause of the shared issue):

Your builds are not sending status updates, and this can sometimes be cause by an expired OAuth token (of your repo owner in this case), which can cause other problems like failed API calls to Github, which could potentially have a role in what's happening with the first run of the PR build.

To fix this I would normally reassign repo ownership to another user of the repo, ideally one with Admin access to the repo. But in your case, I see you are the only user, and that you have Admin access.

Unless there is another user you're aware of that we can test with, we'll have to find out why status updates aren't reaching Github with the permissions that are attached to your OAuth token.

Do you have any insight into why that may be?

Please give me a few moments to explore further and I'll get back with my own findings.

BTW, to set expectations, if resolving this issue doesn't fix your PR builds (meaning: stop future PR builds from reporting false coverage drops), then you are squarely in the group of repos having this problem with no current known cause. It means I'll have to add you to the ticket with those other projects and update you with any progress from there.

afinetooth · 2022-07-27T00:00:58Z

Ok, it turns out I was wrong. You were not the owner of your repo. You were the only user of the repo, but your repo had no owner. I made you the owner and re-ran your builds, and they started sending status updates. (You can check Github to verify status updates on your last three (3) builds.)

Please proceed and let me know, here, if any of your next PR builds also show an incorrect decrease in coverage %.

I'm not sure if this change will resolve that issue, but it would be great to know. The other cases, BTW, did not have this issue AFAIK.

auto-differentiation-dev · 2022-07-27T10:10:24Z

Thank you for all your investigations - that was helpful. However, we see this happening again for the next PR:

While we could modify thresholds etc to make sure we can keep working (get checks to pass), that's not a solution. The coverage badge will also keep fluctuating for no reason.

What do you suggest we can do to get around this? Keep triggering new builds in the PRs until we get a consistent result?

auto-differentiation-dev · 2022-07-27T12:18:30Z

We tried a rebuild of the whole workflow, but this only adds more jobs to the same coveralls build number and doesn't change the overall coverage percentage.

afinetooth · 2022-07-28T23:56:44Z

Yes, I'm sorry to say, you're squarely in the pattern of this known but currently unresolved issue.

Just to make sure, using this PR build as a test:
https://coveralls.io/builds/51200176

I re-ran the jobs in the build and see the coverage % change / correct itself:

Sorry. I'll add you to the issue and share any updates.

OndraM · 2022-09-08T13:05:42Z

Hi, we are affected as well:

This build (which is push event build) reports in the header "First build on feature/test-page-constants at 64.607%" and also in the "source files" part it reports consistent 64.61 lib/ coverage.
However this build, which is PR build for the same branch and the same code, reports "coverage decreased (-29.0003%) to 35.606%". In the "source files" part here is still 64.61 lib/ coverage with "no change" reported. So these two numbers are for some reason inconsistent.

We are getting this behavior for all pull request builds not. However, push builds reports the coverage OK.

afinetooth · 2022-09-08T22:50:48Z

Hi, @OndraM.

Thanks for the report. It sounds like we'll need to add you to this issue. Which I'll do as soon as I'm able to verify and make sure there isn't another underlying cause in your case.

We're experiencing an extremely high volume of support requests this week and are working through a backlog. We'll respond to your issue as soon as we possibly can.

Thanks.

Torgen · 2022-09-17T18:22:13Z

If more data points helps, I think https://github.com/Torgen/codex-blackboard also has this issue. The coveralls site shows the aggregated coverage, but the comment and check result show the lowest single sub-result of the parallel runs.

afinetooth · 2022-09-19T21:34:47Z

@xcelerit-dev @OndraM @Torgen

Workaround for this issue: the Rerun Build Webhook

While we've not yet identified a fix for this issue, we released a workaround today that should resolve it for you: the Rerun Build Webhook.

Since the nature of the issue appears to be that, for some repos with parallel builds:

The coverage % of pull_request builds is calculated incorrectly (on the first build), but
Re-running the build (recalculating coverage) corrects the coverage %.

A Rerun Build Webhook, similar to the (Close) Parallel Build Webhook, fixes the issue by triggering your build to re-calculate itself.

Instructions

Call this at the end of your CI config, after calling the (Close) Parallel Build Webhook.

Call it like this:

curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=<YOUR REPO TOKEN>&build_num=<YOUR BUILD NUMBER>'

But substitute your repo_token, and your build_num (the same value you used for build_num in your (Close) Parallel Build Webhook).

Please note a few differences between the Rerun Build Webhook and the (Close) Parallel Build Webhook:

The /rerun_build endpoint will accept a GET or a POST, and
The only two required URL params are your repo_token and the build_num, and build_num is a regular URL param and not part of a JSON body called "payload" as required by the (Close) Parallel Build Webhook:

curl -k https://coveralls.io/webhook?repo_token=<YOUR REPO_TOKEN> -d "payload[build_num]=<YOUR BUILD NUMBER>&payload[status]=done"

auto-differentiation-dev · 2022-09-20T08:13:49Z

@afinetooth Thank you for the description. We'd like to test this, but we're wondering how we can get the build_num within a GitHub workflow yml file? This is our ci.yml, and I guess the workaround would mean inserting the following extra step into the coverage_finish job:

- name: Rerun coverage workaround
  run: |
    curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=XXXX&build_num=YYYY'

The repo token (XXXX) is simple to get, but how would we get the YYYY? We're using the coveralls github action.

auto-differentiation-dev · 2022-09-20T09:23:38Z

We worked out the required change. After placing a secret for the COVERALLS_REPO_TOKEN, the following workflow step appears to work fine: https://github.com/xcelerit/XAD/blob/main/.github/workflows/ci.yml#L324-L327

See lemurheavy/coveralls-public#1653 (comment)

afinetooth · 2022-09-20T14:19:03Z

@xcelerit-dev apologies. I forgot that some Coveralls integrations (like the Coveralls GitHub Action) have special features around the (Close) Parallel Build Webhook, such that you don't need to build the request.

You got it right. The build_num is ${{ env.GITHUB_RUN_ID }} for the Coveralls GitHub Action.

This came up right away after I posted the solution here as well, so you can see a longer explanation there.

fabiode · 2023-07-11T09:39:20Z

@afinetooth Hi, thanks for the proposed workaround. It works sometimes, but it seems also to fetch random results from one of the parallel runs that had an even lower coverage result as a canonic value for the entire build.

before the workaround we had ±0.3%, now it can reach -38.9% change.

I'm not sure if I did something wrong, but I added the rerun_build request after parallel_build_webhook

- name: Rerun coverage workaround
  run: |
    curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=XXXX&build_num=YYYY'

Replacing XXXX and YYYY by CircleCI env vars.

I was wondering if anyone encountered this oscillation after the workaround.

afinetooth · 2023-07-11T22:13:49Z

@fabiode Can you share the URLs for 2-3 recent builds that this is happening for? I feel like recent changes to our code should make this original issue far less likely to begin with. But I'd also like to verify that we received your rerun webhook, etc. If you repo is private or sensitive, please email us at suport@coveralls.io and mention this issue. I will get it.

afinetooth · 2023-08-10T22:26:42Z

@fabiode Got your request to support@coveralls.io and will reply there. Thanks.

afinetooth added the coverage-discrepancies label Jul 27, 2022

afinetooth added the bug label Jul 28, 2022

OndraM mentioned this issue Sep 8, 2022

Do not submit coveralls on pull_request event php-webdriver/php-webdriver#1009

Closed

auto-differentiation-dev mentioned this issue Sep 20, 2022

Coveralls fix auto-differentiation/xad#19

Merged

OndraM added a commit to OndraM/php-webdriver that referenced this issue Sep 20, 2022

Add coveralls coverage workaround

54adc83

See lemurheavy/coveralls-public#1653 (comment)

OndraM mentioned this issue Sep 20, 2022

Test coveralls workaround php-webdriver/php-webdriver#1014

Draft

OndraM added a commit to OndraM/php-webdriver that referenced this issue Sep 20, 2022

Add coveralls coverage workaround

cee2ef8

See lemurheavy/coveralls-public#1653 (comment)

OndraM added a commit to php-webdriver/php-webdriver that referenced this issue Sep 20, 2022

Add coveralls coverage workaround

ef3b50e

See lemurheavy/coveralls-public#1653 (comment)

OndraM mentioned this issue Sep 20, 2022

Add coveralls coverage workaround php-webdriver/php-webdriver#1015

Closed

OndraM added a commit to php-webdriver/php-webdriver that referenced this issue Sep 20, 2022

Add coveralls coverage workaround

4b82945

See lemurheavy/coveralls-public#1653 (comment)

crowesn mentioned this issue Oct 31, 2022

Coveralls circleci integration is broken uclibs/treatment_database#347

Closed

afinetooth closed this as completed Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coveralls reporting a decrease overall (using parallel), even though nothing changed #1653

Coveralls reporting a decrease overall (using parallel), even though nothing changed #1653

auto-differentiation-dev commented Jul 26, 2022

afinetooth commented Jul 26, 2022 •

edited

afinetooth commented Jul 27, 2022

auto-differentiation-dev commented Jul 27, 2022

auto-differentiation-dev commented Jul 27, 2022

afinetooth commented Jul 28, 2022

OndraM commented Sep 8, 2022

afinetooth commented Sep 8, 2022

Torgen commented Sep 17, 2022

afinetooth commented Sep 19, 2022

auto-differentiation-dev commented Sep 20, 2022 •

edited

auto-differentiation-dev commented Sep 20, 2022

afinetooth commented Sep 20, 2022 •

edited

fabiode commented Jul 11, 2023

afinetooth commented Jul 11, 2023

afinetooth commented Aug 10, 2023

Coveralls reporting a decrease overall (using parallel), even though nothing changed #1653

Coveralls reporting a decrease overall (using parallel), even though nothing changed #1653

Comments

auto-differentiation-dev commented Jul 26, 2022

afinetooth commented Jul 26, 2022 • edited

afinetooth commented Jul 27, 2022

auto-differentiation-dev commented Jul 27, 2022

auto-differentiation-dev commented Jul 27, 2022

afinetooth commented Jul 28, 2022

OndraM commented Sep 8, 2022

afinetooth commented Sep 8, 2022

Torgen commented Sep 17, 2022

afinetooth commented Sep 19, 2022

Workaround for this issue: the Rerun Build Webhook

Instructions

auto-differentiation-dev commented Sep 20, 2022 • edited

auto-differentiation-dev commented Sep 20, 2022

afinetooth commented Sep 20, 2022 • edited

fabiode commented Jul 11, 2023

afinetooth commented Jul 11, 2023

afinetooth commented Aug 10, 2023

afinetooth commented Jul 26, 2022 •

edited

auto-differentiation-dev commented Sep 20, 2022 •

edited

afinetooth commented Sep 20, 2022 •

edited