Use upstream coveragepy's new lcov support #14677

TLATER · 2022-01-31T17:33:45Z

As explained in 8c3ea99, coveragepy recently added lcov support. This patch set intends to tackle #10660 (and bazelbuild/rules_python#43), and make python coverage fully usable out-of-the box. It builds on work by @ulfjack in #10433, but now with actual support from upstream.

This also incidentally fixes a bug where bazel test would no longer work if the PYTHON_COVERAGE environment variable was set (can report/fix separately if desired), as well as #14436 (this should perhaps be raised in a separate PR, but made much more sense after refactoring here).

See the commit messages for more detailed comments.

This is a draft, but I would like some review of the overall approach, and whether this is workable in this state. I'm raising this as a draft for these reasons:

A commit to allow coveragepy resolution by label is included. It's not implemented correctly at the moment (see commit message), and I'd like feedback on how this could be achieved better.
The comments regarding os.execv usage seem to have a strange history (see commit message), and I would like to figure out what happened here before continuing the legacy.
More consideration for other coverage tools might be more important than I think.
Testing for this is slim because it depends on coveragepy and I'm unsure how/if I should integrate that into CI.
The python coverage documentation should be updated.

@c-mita and @lfpino may also be interested.

TLATER · 2022-01-31T22:30:40Z

Hm, interesting, I'll have to look into those test failures.

TLATER · 2022-02-01T13:16:56Z

The failing test is //src/test/shell/bazel:python_version_test, the culprit is an accidental f-string. New habits die hard, it seems.

comius · 2022-02-02T06:42:05Z

cc @rickeylev for Python

[Coveragepy recently gained support for lcov output][nedbat/coveragepy#1289], which allows implementing full support for python coverage without relying on a downstream fork of that project Coveragepy actually must be invoked twice; One generating a `.coverage` database file, the other exporting the data. This means that it is incompatible with the old implementation. The fork of coveragepy previously intended for use with bazel circumvented this by just changing how coveragepy works, and never outputting that database - just outputting the lcov directly instead. If we'd like to use upstream coveragepy, this is of course not possible. The stub_template seems to be written with the idea of supporting other coverage tooling in mind, however it still hard-codes arguments specific to coveragepy. Instead, we think it makes sense to properly support one of them for now, and to rethink a more generic interface later - it will probably take specific scripting for each implementation of coverage in python anyway. As such, this patch rewrites the python stub template to fully support upstream coveragepy as a coverage tool, and reworks some of the logic around invoking python to do so more cleanly. Additional notes: - Python coverage will only work with Python 3.7+ with upstream coveragepy, since the first release with lcov support does not support earlier Python versions - this is unfortunate, but there is not much we can do downstream short of forking to resolve that. The stub template itself should still work with Python 2.4+. - Comments in the code claim to use `os.execv` for performance reasons. There may be a small overhead to `subprocess.call`, but it shouldn't be too impactful, especially considering the overhead in logic (written in Python) this involves - if this is indeed for performance reasons, this is probably a somewhat premature optimization. A colleauge helped dig through some history, finding 3bed4af as the source of this - if that commit is to believed, this is actually to resolve issues with signal handling, however that seems odd as well, since this calls arbitrary python applications, which in turn may use subprocesses again as well, and therefore break what that commit seems to attempt to fix. It's completely opaque to me why we put so much effort into trying to ensure we use `os.execv`. I've replicated the behavior and comments assuming it was correct previously, but the patch probably shouldn't land as-is - the comment explaining the use of `os.execv` is most likely misleading. --- [nedbat/coveragepy#1289]: nedbat/coveragepy#1289 Co-authored-by: Bradley Burns <bradley.burns@codethink.co.uk>

This resolves bazelbuild#14436 by adding a new environment variable that will perform the coverage label resolution inside the python_stub_template directly, which resolves the python coverage tool by label rather than path. Currently this resolves the path in the runfiles directory by guessing the path the label should resolve to - this of course does not work in general, even just defining an alias breaks it. Since labels appear to only be resolved in the analysis phase, there does not seem to be an easy way around this, however. This is a draft, showing the behavior I would like - suggestions on how to correctly implement this would be appreciated. Co-authored-by: Bradley Burns <bradley.burns@codethink.co.uk>

adam-azarchs · 2022-04-29T22:39:45Z

Not to distract from what this PR is trying to accomplish, but it's only part of the puzzle when it comes to hermetic python toolchains. In order to support those, we'd need to also add an attribute to py_runtime to supply a target for coveragepy, and I guess plumb that through to PYTHON_COVERAGE. That sort of thing is particularly important in a remote build scenario where the developer may have no control over whether there exists a system-wide installation of coverage on the remote builder.

TLATER · 2022-04-30T11:49:22Z

That sounds like a good idea. We were using rules_python to provide it instead, but it also currently lacks toolchain support (and did things rather more impurely than I'm comfortable with).

rickeylev · 2022-04-30T17:43:04Z

we'd need to also add an attribute to py_runtime to supply a target for coveragepy, and I guess plumb that through to PYTHON_COVERAGE.

I'm not too familiar with how coverage works, so forgive me: why does the py_runtime rule need a coverage attribute? That would imply that a particular runtime and coverage are intrinsically linked (e.g. a particular coverage implementation only works with a particular runtime, and vice versa). IIUC, coverage isn't so strictly tied to a runtime, so it seems more like something that would go on the toolchain or rules (which wire together the runtime with other parts).

adam-azarchs · 2022-05-01T03:27:24Z

py_runtime is how you set up the python toolchain. A python toolchain setup looks something like

py_runtime(
    name = "python3_runtime",
    files = [":python"],
    interpreter = ":python_interpreter_exe",
    python_version = "PY3",
)

py_runtime_pair(
    name = "python_runtime_pair",
    py3_runtime = ":python3_runtime",
)

toolchain(
    name = "python_toolchain",
    exec_compatible_with = ["@platforms//os:linux"],
    target_compatible_with = ["@platforms//os:linux"],
    toolchain = ":python_runtime_pair",
    toolchain_type = "@bazel_tools//tools/python:toolchain_type",
    visibility = ["//visibility:public"],
)

While one could just put coverage into the files attribute of py_runtime, making it a separate attribute would have a few advantages:

No need for the user to muck around with environment variables (which always seems a little but "un-bazely" to me).
Only depending on the coverage library when running with code coverage.
Being able to supply different coverage libraries to different runtimes, e.g. probably an older version for python 2, if anyone is still using that.

rickeylev · 2022-05-01T07:39:50Z

Ahhhh, huh. Thanks. I didn't realize py_runtime was, in effect, the toolchain (we don't have toolchains enabled for Python at Google, so this isn't an area I'm very familiar with). Well, that's kind of unfortunate naming, but yeah, putting the coverage settings on the toolchain could make sense. TBH, I don't fully understand the pros/cons of a toolchain over an implicit attribute of the rule, but both are pretty transparent to the user, so I'm not sure if it matters much.

adam-azarchs · 2022-05-02T16:22:34Z

The advantage of a toolchain is that it can be configured. An implicit attribute of the rule has to be a fixed target. In our use case we very specifically cannot use the automatically-configured python toolchain based on the system-wide installation, because that is usually not the right version of python.

adam-azarchs · 2022-05-25T02:31:33Z

Adding an attribute to py_runtime doesn't look like it would be that hard to do; I'd probably have already made a PR to do it except it's still in the java side of things, with which I have a lot less experience. But also, as I said earlier, it's kind of orthogonal to this PR which, as far as I understand it, would still be required in order to make things actually work.

@TLATER

This allows users to specify a target providing the coveragepy tool (and its dependencies). This is essential for hermetic python builds, where an absolute path will not really work. It's also superior to other potential methods using environment variables because the runfiles dependency on the coverage tool and its files is only incurred when building with coverage enabled. This also builds on the work @TLATER began with #14677 to integrate with `coveragepy`'s `lcov` support, with an additional step of at least attempting to convert the absolute paths which `coveragepy` uses in the lcov output into the relative paths which the rest of bazel can actually consume. This is my first time touching Java code professionally, so I'll admit to mostly cargo-culting those parts, and would welcome any feedback on how to improve things there. I also would have no objections to someone else taking over this PR to get it over the finish line. I've tested this out with our own team's internal monorepo, and have successfully generated a full combined coverage report for most of our python and go code. There's still a bunch of things which don't quite work, in particular when it comes to compiled extension modules or executables run from within python tests, but those will need to be addressed separately, and this is already a giant step forward for our team. Closes #14436. Closes #15590. PiperOrigin-RevId: 476314433 Change-Id: I4be4d10e0af741f4ba1a7b5367c6f7a338a3c43d

@TLATER

This allows users to specify a target providing the coveragepy tool (and its dependencies). This is essential for hermetic python builds, where an absolute path will not really work. It's also superior to other potential methods using environment variables because the runfiles dependency on the coverage tool and its files is only incurred when building with coverage enabled. This also builds on the work @TLATER began with bazelbuild#14677 to integrate with `coveragepy`'s `lcov` support, with an additional step of at least attempting to convert the absolute paths which `coveragepy` uses in the lcov output into the relative paths which the rest of bazel can actually consume. This is my first time touching Java code professionally, so I'll admit to mostly cargo-culting those parts, and would welcome any feedback on how to improve things there. I also would have no objections to someone else taking over this PR to get it over the finish line. I've tested this out with our own team's internal monorepo, and have successfully generated a full combined coverage report for most of our python and go code. There's still a bunch of things which don't quite work, in particular when it comes to compiled extension modules or executables run from within python tests, but those will need to be addressed separately, and this is already a giant step forward for our team. Closes bazelbuild#14436. Closes bazelbuild#15590. PiperOrigin-RevId: 476314433 Change-Id: I4be4d10e0af741f4ba1a7b5367c6f7a338a3c43d

TLATER · 2022-11-10T17:49:21Z

Superseded by @adam-azarchs ' #15590, thanks :)

TLATER force-pushed the tlater/coveragepy-lcov-support branch from d923f21 to da31bce Compare February 1, 2022 12:59

gregestren added the team-Rules-Python Native rules for Python label Feb 1, 2022

gregestren assigned comius Feb 1, 2022

comius assigned comius and unassigned comius Feb 2, 2022

TLATER and others added 2 commits February 2, 2022 16:27

TLATER force-pushed the tlater/coveragepy-lcov-support branch from da31bce to f4f1c23 Compare February 2, 2022 16:28

comius removed their assignment Mar 23, 2022

comius requested a review from c-mita April 6, 2022 06:23

sgowroji added the awaiting-user-response Awaiting a response from the author label May 5, 2022

adam-azarchs mentioned this pull request May 28, 2022

rules/python: Add a coverage_tool attribute to py_runtime. #15590

Closed

TLATER closed this Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use upstream coveragepy's new lcov support #14677

Use upstream coveragepy's new lcov support #14677

TLATER commented Jan 31, 2022

TLATER commented Jan 31, 2022

TLATER commented Feb 1, 2022

comius commented Feb 2, 2022

adam-azarchs commented Apr 29, 2022

TLATER commented Apr 30, 2022

rickeylev commented Apr 30, 2022

adam-azarchs commented May 1, 2022

rickeylev commented May 1, 2022

adam-azarchs commented May 2, 2022

adam-azarchs commented May 25, 2022

TLATER commented Nov 10, 2022

Use upstream coveragepy's new lcov support #14677

Use upstream coveragepy's new lcov support #14677

Conversation

TLATER commented Jan 31, 2022

TLATER commented Jan 31, 2022

TLATER commented Feb 1, 2022

comius commented Feb 2, 2022

adam-azarchs commented Apr 29, 2022

TLATER commented Apr 30, 2022

rickeylev commented Apr 30, 2022

adam-azarchs commented May 1, 2022

rickeylev commented May 1, 2022

adam-azarchs commented May 2, 2022

adam-azarchs commented May 25, 2022

TLATER commented Nov 10, 2022