#2297 new defect

improve precision of coverage reports by including coverage by subprocesses

Reported by: daira Owned by:
Priority: normal Milestone: undecided
Component: code Version: 1.10.0
Keywords: coverage subprocess Cc:
Launchpad Bug:

Description (last modified by daira)

Currently, coverage reports underestimate covered code in some places (this particularly affects the code for starting nodes) because only Python code directly executed by coverage run is measured.

coverage has a feature to measure the code running in multiple processes and then combine the measurements:

http://nedbatchelder.com/code/coverage/subprocess.html http://nedbatchelder.com/code/coverage/cmd.html#cmd-combining

To make this work, we need to:

  • Run this early on when starting Tahoe processes (e.g. in src/allmydata/__init__.py):
    try:
        from coverage import process_startup
        process_startup()
    except ImportError:
        pass
    
  • Set the COVERAGE_PROCESS_START environment variable (which will be inherited by subprocesses) when running coverage from setup.py and the Makefile.
  • Pass the -p option to the coverage command line.
  • Ensure that the .coverage.* files are being generated in the same directory.
  • After coverage run, execute coverage combine.

Change History (6)

comment:1 Changed at 2014-09-10T08:35:02Z by daira

  • Description modified (diff)

comment:2 Changed at 2014-09-10T16:35:05Z by warner

Or, let's abolish subprocesses! I think we can get rid of most of our uses of subprocess.call(). The hardest one to get rid of would be code that tests routines that do a fork-but-not-exec (e.g. tahoe run). In Petmail I was mostly able to implement that with threads instead. Not sure if I achieve coverage of the code on the far side of the fork, but I'll look.

comment:3 Changed at 2014-09-11T00:31:17Z by daira

I don't see how we can abolish subprocesses. We do perform as many of the runner tests as possible using deferToThread already, but that doesn't fully test how nodes are actually run (indeed we've had several bugs that only showed up in the subprocess tests, because they are more realistic).

Last edited at 2014-09-11T00:31:30Z by daira (previous) (diff)

comment:4 Changed at 2020-01-13T20:23:57Z by exarkun

I think it *should* be possible to eliminate most subprocess use in the test suite. The tests that run subprocesses are error prone and slow on top of being a problem for coverage measurement.

*All* code should be covered by unit tests in-process. It may be necessary to *also* have some tests which run a small number of very simple subprocesses in order to test real code that does subprocess management but other application logic should not be exercised by these tests.

Also, despite the information in the ticket description, measuring subprocess coverage is *hard*. coverage.py is slow enough that many tests can't complete within the various deadlines imposed by different test runners and CI systems when the subprocesses run by those tests have coverage.py turned on. All of that code could *probably* be sped up so this isn't a problem but I would much rather see the effort put into removing the over-use of subprocesses in the test suite.

Also, since this ticket was created, a new suite of "integration" tests have been added which *do* measure coverage on subprocesses they launch. This coverage isn't integrated into the coverage report generated by the unit test runs but it could be (or it could be reported alongside or something).

However it's never been entirely clear to me what the value of subprocess coverage measurements are. Tests that run a whole Tahoe-LAFS child process execute huge amounts of code but they generally don't assert the *correctness* of very much such code. All a coverage report can tell you is what was *executed*. Coverage measurement of one single test that runs a child process will show you thousands or tens of thousands of lines of covered code. Sure, it was executed, but is it right? That one test hardly tells you.

comment:5 Changed at 2020-01-13T20:24:34Z by exarkun

tl;dr the coverage reports do currently underreport *executed* code but I don't believe they underreport *well tested* code.

Note: See TracTickets for help on using tickets.