Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a structured "report", probably JSON #720

Closed
nedbat opened this issue Oct 9, 2018 · 16 comments
Closed

Add a structured "report", probably JSON #720

nedbat opened this issue Oct 9, 2018 · 16 comments
Labels
enhancement New feature or request fixed

Comments

@nedbat
Copy link
Owner

nedbat commented Oct 9, 2018

There are lots of use cases that want detailed data from coverage. The only way they have to get it today is from the XML report, which is limited by being a Java-native format.

A new report which included all of the information in a convenient machine-readable form would enable more ater-tools to be built for coverage. @atodorov started work on a JSON report here: https://bitbucket.org/ned/coveragepy/pull-requests/61/add-json-command/diff

@atodorov
Copy link
Contributor

atodorov commented Oct 9, 2018

@nedbat it's been a while but can you summarize the status of the code on the BitBucket pull request.

If there are only a few things left to do (like resolving conflicts and minor issues) I may be able to pick it up where I left it and send a PR here on GitHub.

@nedbat
Copy link
Owner Author

nedbat commented Oct 9, 2018

@atodorov to be honest, I haven't looked at it closely in a while, but thanks. If you move it here, we can start reviewing it.

@Bachmann1234
Copy link
Contributor

I was curious and it looked like this issue was stale so I started futsing with this by branching off the 4x line. https://github.com/Bachmann1234/coveragepy/tree/json_report_4

my POC basically duplicated the XML report and replaced al the XML talk with dict calls and called json.dumps

The result report looks like https://gist.github.com/Bachmann1234/6347d0db7437e71323a2c391e3031ac0 (assuming I did it right. Its a spike right now and would need more tests/verification)

I think doing this the right way though would include a method that would not include so much duplication. Perhaps creating a dictionary then having the json report dump it and for the XML report simply use that dictionary to drive the rendering.

Though ultimately it comes down to the desires for the json report I guess? Is keeping it close to the cobertura xml appropriate? 🤷‍♂

anyways ill keep playing with this

@nedbat
Copy link
Owner Author

nedbat commented Jul 9, 2019

@Bachmann1234 thanks, I think people would appreciate a supported structured report format. It's probably ok to be working from the 4.x line, since the code you are looking at hasn't changed much. I think you are right that a JSON report and the XML report might share a data gathering phase, and then write the data differently.

I recently did something similar to the HTML report in 5.x, which made it easier to test.

BUT: I don't want to follow the Cobertura schema: they focus on "classes", which isn't right for Python. Let's get a data format that is right for coverage.py, and then assess how well a common data gathering phase would work. Don't for

And we need to deal with the new data in 5.0: contexts, including dynamic contexts. It's not clear how best to handle it, because it could balloon quickly. Should a JSON report be compact (in some kind of 3rd-normal form) or convenient (which could involve duplicating a lot of data)?

I'm really glad you are looking into this. Let's keep talking through the design, it will make a lot of people happy to have something like this.

@Bachmann1234
Copy link
Contributor

I'll keep thinking out loud in case life happens and I fade away.

So python has packages, modules, classes, functions, lines

Not sure how to handle the recursive nature quite yet. (Functions can contain functions...). My instinct is trying to provide that much detail may just make the report hard to use.

Maybe just focusing on package, module, line?

I'll take a look at the HTML and console report and see if they are better guides.

@nedbat
Copy link
Owner Author

nedbat commented Jul 9, 2019

Internally, coverage.py deals with files, lines, arcs (jumps from line to line for branch coverage), and contexts. For a start, the JSON report only has to do a good job getting the existing data out in a supported structured form.

@Bachmann1234
Copy link
Contributor

Bachmann1234 commented Jul 9, 2019 via email

@Bachmann1234
Copy link
Contributor

Bachmann1234 commented Jul 10, 2019

Quick update. based on feedback I updated my branch to be based master (rather than 4.x. I fixed the issue I was having) and made a simplified json report. The branch is living here https://github.com/Bachmann1234/coveragepy/tree/json_report

Currently it's just built using the public api. Next I need details that are not currently in any public api.

The branch is not tested right now. Mostly because I kinda assume the "schema" for this is going to iterate a bunch. Once we start getting close to something that would be useful then I can start locking down the behavior with tests

@atodorov
Copy link
Contributor

Folks, I'm just letting you know I won't be able to work on this at all. OTOH you may want to check-out this thread on PyCQA mailing list:
https://mail.python.org/pipermail/code-quality/2018-September/001065.html

@Bachmann1234
Copy link
Contributor

Bachmann1234 commented Jul 10, 2019

It's an interesting spec. As someone who manages a tool that works with a lot of different static analysis tools I love the idea of all of them using a unified machine parsable format.

I'm not super convinced sarif is a good fit for a coverage tool. It seems more designed for rule based linters where you have a set of rules and you are reporting on sections of code violating those rules.

It's always been my experience with coverage tools that they mostly are there to draw a picture and the interpretation of that picture is left for other tools.(features like 'fail under' notwithstanding)

This json report would provide a picture. I could imagine another tool which could define coverage rules for parts of a codebase that would take in this report and spit out a sarif style report. But that's probably a project outside of this one.

@Bachmann1234
Copy link
Contributor

Status report:

Here is what I have for a simple report. Nothing involving contexts or branch coverage yet (im still wrapping my head around arcs...)

{
    "version": "5.0a6",
    "timestamp": "1562902290",
    "measured_files": [
        {
            "measured_file": "example.py",
            "missing_lines": [
                5
            ],
            "executed_lines": [
                1,
                2,
                4,
                5,
                7,
                8
            ],
            "summary": {
                "missing_lines": 1,
                "covered_lines": 5,
                "num_statements": 6
            }
        }
    ],
    "totals": {
        "missing_lines": 1,
        "covered_lines": 5,
        "num_statements": 6
    }
}

For a a run on a non trivial script here is a gist

https://gist.github.com/Bachmann1234/88eb0c941112550034cc8de62ca3a9d7

im fairly happy how the code shook out and it was not as much as a hack spike as I was worried it would be. So Im actually gonna write tests for what I have and aim to put up a WIP PR this weekend.

@Bachmann1234
Copy link
Contributor

Bachmann1234 commented Jul 12, 2019

So I thought I was done for the night but I kinda kept messing around.

I added branch coverage stats and context stuff. I still think there is a lot to do. Tests, docs, settings around the report. Iterating on the structure of the report itself. etc etc. But I am excited :-)

Anyways, here is a version with branch coverage
https://gist.github.com/Bachmann1234/b251cbfac7033ba52d56c87e3e243696

Here is a version without branch
https://gist.github.com/Bachmann1234/a3ec2ddbcbeed621e8d756b917d48ce1

I still dont quite get arcs. They are pairs of line numbers... but when I look at the data there are a lot of negative numbers. I don't know what that means.

@Bachmann1234
Copy link
Contributor

Note to self. "Measured_files" should be a dict keyed on relative file path. Probably far more useful than a list.

@nedbat
Copy link
Owner Author

nedbat commented Jul 13, 2019

Thanks for keeping on this. It might be a little easier to look at schemas schematically (so to speak) rather than as a full data file.

In arcs, a negative first number -N means "entered a function starting at line N", and a negative second number -N means, "exited a function starting at line N."

We should have percentages also, like the final column from an HTML report.

Maybe "measured_files" should just be "files", and have it be an object with the file name as the key?

@nedbat
Copy link
Owner Author

nedbat commented Aug 31, 2019

#825 added a JSON report.

@nedbat nedbat closed this as completed Aug 31, 2019
@nedbat
Copy link
Owner Author

nedbat commented Sep 21, 2019

This is in 5.0a7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed
Projects
None yet
Development

No branches or pull requests

3 participants