Help understanding coverage decrease #1632

jinhong- · 2022-05-06T12:02:42Z

Am struggling with understanding why coverage reports a decrease when there is no change in codes that affects the coverage. Attached the screenshot below. The number reported in the Tree is different from the coverage decrease reported

afinetooth · 2022-05-06T17:28:18Z

hi @jinhong-,

I took a guess at the project and build you're referencing above. (I'm using it here since only you and your team members can access that link.)

Looking at the PR build (RIGHT), compared to the PR's base build (LEFT):

I'm also confused at the coverage change, since I don't see any indicators of that difference in the files themselves, only in the RUN DETAILS, which correlates with the change.

But I suspect those numbers are wrong and that the build may have been corrupted, so I re-ran the parallel jobs in the order they arrived, and got a new result (a new coverage calculation), which confirms that.

There is now NO CHANGE in coverage between the two builds:

I'm afraid there's little to go on when it comes to one-off corrupted builds, in terms of determining root cause. We would need to a see a pattern, so please let us know, here, or at support@coveralls.io, if this kind of result persists.

In the meantime, to answer your other question:

The number reported in the Tree is different from the coverage decrease reported

The reason for that is that the FILES section only displays the line coverage in your project's coverage report, whereas your Coveralls repo, as configured in its SETTINGS, also tracks the branch coverage in your coverage reports and considers it in calculating total (aggregate) coverage.

The aggregate coverage is reflected in RUN DETAILS, and you'll note that yours includes branch coverage details:

Here's the formula for aggregate coverage when branch coverage is included:

aggregate coverage w/ branch coverage = (lines hit + branches hit) / (relevant lines + relevant branches)

And here's the branch coverage setting in your project SETTINGS:

jinhong- · 2022-05-09T03:59:18Z

Thanks! We will keep observing. It seems to be happening fairly frequently in terms of mis reporting of coverage. Does the order of parallel execution matter?
Also, is there an explanation of what branch coverage means?

jinhong- · 2022-05-09T05:05:13Z

Here's another that failed with 0% again
https://coveralls.io/builds/48934608

afinetooth · 2022-05-09T23:11:01Z

@jinhong- Yes, I see the same behavior again. But I don't see any underlying reason for it. There is nothing out of order with your coverage posts, or how they came in. They are in a different order in the PR, but we're aware of that. (Coveralls knows that the previous job for build 2 in the PR build is job 3 in the base build, etc.)

The "reasoning" for the drop in coverage comes from the RUN DETAILS, and you can see how that is purported to change between the base build (LEFT) and the PR build (RIGHT), here:

Those RUN DETAILS are supposed to be the un-modified details from your coverage reports.

So the first thought about root cause is an issue with your reports. But I can eliminate that if I re-run your PR build and get more accurate results, like last time.

Which I did and...

Again, the RUN DETAILS changed after Coveralls re-consumed each of your coverage reports (jobs).

So we have our pattern. But unfortunately I still can't name the cause.

Obviously, there's something interfering in the consumption of coverage reports the first time around.

The next step in terms of diagnosing would be for you to invoke your next builds in verbose mode and share with me your CI build logs. (At least the portions related to Coveralls.)

I know your project is private, so feel free to share those to support@coveralls.io and just mention this issue. I will look for them.

Here's how to enable verbose mode for the Coveralls Github Action:

Add this environment variable to your GHA config yaml so it's available to your Coveralls step(s) - NODE_COVERALLS_DEBUG=1 - The NODE_ part is there because the Coveralls Github Action is running the node-coveralls integration under-the-hood.
Make sure you're getting verbose output in your CI build logs - You'll see a bunch of lines starting with "Debug" so it should be clear.
Share the verbose build log for all three (3) parallel jobs, and for your parallel build close webhook - So you'll share four (4) CI build logs total with me, at support@coveralls.io.

Thanks.

jinhong- · 2022-05-12T17:11:11Z

I have sent the logs over to you

afinetooth · 2022-05-12T20:21:45Z

@jinhong- Got them Thanks. Will reply in email but backfill any details here that will help others.

dhui · 2022-05-14T05:32:35Z

I believe we're seeing a similar issue (for a while now) where the builds are incorrectly reporting a -0.0% decrease in coverage but the aggregate is not. I've changed the Coverage Decrease Threshold for failure for the repo to 0.1% as a workaround.

Example:

Screenshot
build
commit

afinetooth · 2022-05-16T17:57:30Z

Thanks, @dhui. @jinhong- is that a viable workaround for you for the time being?

In your SETTINGS, you would enter 0.1 into the COVERAGE DECREASE THRESHOLD FOR FAILURE field, like so:

We are trying to determine what kind of debug info / monitoring would help us understand what's happening with your initial builds that don't calculate properly.

jinhong- · 2022-05-18T13:02:56Z

Unfortunately that may not help us as for our case, the coverage seems to drop down to zero

jinhong- · 2022-05-18T13:41:25Z

@afinetooth how are you re-running the tests in the order they are arriving? Are you able to expose this functionality for me to re-run? I am facing this issue fairly frequently

jinhong- · 2022-05-18T13:44:43Z

My theory is that the analysis is timing out/erroring out on your backend, and the behavior of timing out is to have coverage reported at 0%. I observed that the results took longer than usual to arrive. I am assuming there is some background processing involved

afinetooth · 2022-05-18T21:56:53Z

@jinhong- Unfortunately it's not something I can expose for you to trigger. Right now, it's just an internal command I can execute via dev console, so not available via API or anything. It is planned for future release, but probably not on a timeline to be of use here.

Your theory is reasonable. To test it I ran a report of your last 100 builds (attached---it's anonymized.) and your build times all look normal, except for the original build ID referenced above: 48890929. It's one of the only builds with a longer build time and in that case the build time is an extreme outlier.

last_100_builds_20220518.csv

Maybe you can look through the file and see if the IDs of any more of your problem builds match longer build times. (I don't really see any builds that took as long at the one mentioned, though.) Note that the build ID is what's displayed in the URL for your build, not the label given by your CI service that appears on your build pages.

jinhong- · 2022-05-19T03:27:13Z

My theory is that the analysis is timing out/erroring out on your backend, and the behavior of timing out is to have coverage reported at 0%. I observed that the results took longer than usual to arrive. I am assuming there is some background processing involved

Also, I noticed the exact same PR/build (2345857520) would first fail with 0%, then pass afterwards. Did you trigger a re-run for build 2345857520?

Few questions on the CSV file

Does the build time represent how long server takes to process and/or time taken for the first of N parallel requests posted to coveralls?
I noticed there are duplicate builds. Is each build represented by each trigger from GitHub actions in this case? or is it represented by each parallel request posted to coveralls?
I noticed there are many incomplete builds. What do those mean?

chapayevdauren · 2022-05-19T12:12:32Z

having the same issue

afinetooth · 2022-05-19T22:15:58Z

Hi @chapayevdauren. I'll need to know the Coveralls URL for your repo, or the URL for the problematic build.

If it's private, or sensitive, please email support@coveralls.io and mention this issue. I'll get it and reply.

afinetooth · 2022-05-23T19:13:13Z

@chapayevdauren — replied in email.

dhui · 2022-05-23T21:41:17Z

Thanks, @dhui. @jinhong- is that a viable workaround for you for the time being?

In your SETTINGS, you would enter 0.1 into the COVERAGE DECREASE THRESHOLD FOR FAILURE field, like so:

We are trying to determine what kind of debug info / monitoring would help us understand what's happening with your initial builds that don't calculate properly.

@afinetooth
I triggered a re-run after setting the Coverage Decrease Threshold for failure for the repo to 0.1% and now the commit status shows as passing. This temporary work around should work for us for now since the code base isn't huge. e.g. w/ 6.5K LOC, 0.1% would mean that ~6 LOC could lose coverage and the coverage checks would still pass. Could we set this value lower? e.g. .01% or .001%

afinetooth · 2022-05-26T18:57:53Z

Thanks @dhui, for the update.

@dhui and @chapayevdauren, I also have an update:
We are seeing this pattern in several other customer repos right now. Which is to say, intermittent PR builds showing 0% coverage, caused by an incorrect aggregate coverage calculation, which is corrected by re-running / re-playing the original jobs.

We don't currently understand the root cause, but we think it may be due to some recent issues described on our status page:
https://status.coveralls.io/

That said, the normal behavior would be for the builds to take longer than normal, not complete incorrectly. So if the above is a cause, it's for a different reason, such as the calculation job failing before it can obtain its data, due to a timeout, etc.

Will share updates here.

afinetooth · 2022-09-19T21:35:44Z

@jinhong- @dhui @chapayevdauren @

Workaround for this issue: the Rerun Build Webhook

While we've not yet identified a fix for this issue, we released a workaround today that should resolve it for you: the Rerun Build Webhook.

Since the nature of the issue appears to be that, for some repos with parallel builds:

The coverage % of pull_request builds is calculated incorrectly (on the first build), but
Re-running the build (recalculating coverage) corrects the coverage %.

A Rerun Build Webhook, similar to the (Close) Parallel Build Webhook, fixes the issue by triggering your build to re-calculate itself.

Instructions

Call this at the end of your CI config, after calling the (Close) Parallel Build Webhook.

Call it like this:

curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=<YOUR REPO TOKEN>&build_num=<YOUR BUILD NUMBER>'

But substitute your repo_token, and your build_num (the same value you used for build_num in your (Close) Parallel Build Webhook).

Please note a few differences between the Rerun Build Webhook and the (Close) Parallel Build Webhook:

The /rerun_build endpoint will accept a GET or a POST, and
The only two required URL params are your repo_token and the build_num, and build_num is a regular URL param and not part of a JSON body called "payload" as required by the (Close) Parallel Build Webhook:

curl -k https://coveralls.io/webhook?repo_token=<YOUR REPO_TOKEN> -d "payload[build_num]=<YOUR BUILD NUMBER>&payload[status]=done"

afinetooth · 2022-09-20T14:30:20Z

NOTE: In case you're having trouble determining what build_num is for your project, I posted some follow-up here.

If you're using a different Coveralls integration and/or are still having trouble determining the correct values for either build_num or repo_token let me know here, or in the context of your issue, or at support@coveralls.io.

afinetooth added the coverage-discrepancies label May 6, 2022

afinetooth mentioned this issue Jul 26, 2022

Coveralls reporting a decrease overall (using parallel), even though nothing changed #1653

Closed

afinetooth added the bug label Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help understanding coverage decrease #1632

Help understanding coverage decrease #1632

jinhong- commented May 6, 2022

afinetooth commented May 6, 2022 •

edited

jinhong- commented May 9, 2022

jinhong- commented May 9, 2022

afinetooth commented May 9, 2022 •

edited

jinhong- commented May 12, 2022

afinetooth commented May 12, 2022

dhui commented May 14, 2022

afinetooth commented May 16, 2022

jinhong- commented May 18, 2022

jinhong- commented May 18, 2022

jinhong- commented May 18, 2022

afinetooth commented May 18, 2022 •

edited

jinhong- commented May 19, 2022

chapayevdauren commented May 19, 2022

afinetooth commented May 19, 2022

afinetooth commented May 23, 2022 •

edited

dhui commented May 23, 2022

afinetooth commented May 26, 2022

afinetooth commented Sep 19, 2022

afinetooth commented Sep 20, 2022 •

edited

Help understanding coverage decrease #1632

Help understanding coverage decrease #1632

Comments

jinhong- commented May 6, 2022

afinetooth commented May 6, 2022 • edited

jinhong- commented May 9, 2022

jinhong- commented May 9, 2022

afinetooth commented May 9, 2022 • edited

jinhong- commented May 12, 2022

afinetooth commented May 12, 2022

dhui commented May 14, 2022

afinetooth commented May 16, 2022

jinhong- commented May 18, 2022

jinhong- commented May 18, 2022

jinhong- commented May 18, 2022

afinetooth commented May 18, 2022 • edited

jinhong- commented May 19, 2022

chapayevdauren commented May 19, 2022

afinetooth commented May 19, 2022

afinetooth commented May 23, 2022 • edited

dhui commented May 23, 2022

afinetooth commented May 26, 2022

afinetooth commented Sep 19, 2022

Workaround for this issue: the Rerun Build Webhook

Instructions

afinetooth commented Sep 20, 2022 • edited

afinetooth commented May 6, 2022 •

edited

afinetooth commented May 9, 2022 •

edited

afinetooth commented May 18, 2022 •

edited

afinetooth commented May 23, 2022 •

edited

afinetooth commented Sep 20, 2022 •

edited