[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2891: fix intermittent test coverage

Tue Aug 15 21:14:24 UTC 2017

#2891: fix intermittent test coverage
------------------------+-----------------------
     Reporter:  warner  |      Owner:
         Type:  defect  |     Status:  new
     Priority:  normal  |  Milestone:  undecided
    Component:  code    |    Version:  1.12.1
   Resolution:          |   Keywords:
Launchpad Bug:          |
------------------------+-----------------------

Old description:

> This ticket is to keep track of lines which sometimes get covered during
> unit tests and which sometimes do not. This probably happens because
> those lines are only exercised by "system" -style tests, which randomly
> hit some servers and not others. We should fix these by writing more
> narrowly-focussed "unit"-style tests that exercise just the function in
> question, deterministically.
>
> The most annoying thing about these tests is when they cause perfectly
> valid pull requests to be flagged as broken (because the test coverage
> happened to drop when they went through CI, through no fault of the PR's
> changes).
>
> Let's ~~strikeout~~ these items as we fix them.
>
> * src/allmydata/mutable/servermap.py
>  * ~~~[https://github.com/tahoe-lafs/tahoe-
> lafs/blob/74e7ef4b98bc322da5a49c33e0706b57aab3a6d3/src/allmydata/mutable/servermap.py#L39
> L 39: UpdateStatus.add_per_server_time]~~~
>   * the status object's `.timings["per_server"]` attribute is a dict-of-
> lists, and line 39 handles the case where we *don't* have to add a new
> entry to the dict. This should just be replaced by a
> `collections.defaultdict`, rather than needing any new tests.
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/74e7ef4b98bc322da5a49c33e0706b57aab3a6d3/src/allmydata/mutable/servermap.py#L923
> L 923: ServermapUpdater._try_to_validate_privkey]
>   * this clause checks that the share contained an (encrypted) RSA
> signing key that matches the expectation we got from the writecap. The
> clause should succeed for normal shares, but will fail for corrupted
> shares. Apparently we have a test that corrupts some shares, but which
> then doesn't always look at them.
>
> * src/allmydata/mutable/publish.py
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/mutable/publish.py#L451
> L 451: Publish.publish]
>   * to exercise this, we need some bad shares recorded in the servermap
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/mutable/publish.py#L496
> L 496 Publish.publish]
>
> * src/allmydata/immutable/upload.py
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/upload.py#L703
> Tahoe2ServerSelector._buckets_allocated()]
>   * the `if s in self.homeless_shares` inside the non-Failure path
> doesn't always exercise both sides of the branch
>
> * src/allmydata/immutable/downloader/share.py
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L211
> Share.loop()]
>   * this share-is-corrupted branch is not always exercised
>  * L228-236
>   * the !DataUnavailable branch is not always exercised
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L277
> L277-279 Share._do_loop()]
>   * this doesn't always see a non-empty `disappointment` array, and
> sometimes doesn't raise !DataUnavailable. This is the parent cause of the
> L228-L236 non-coverage above.
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L386
> L386-389 _satisfy_offsets()]
>   * the share_hashes_size test doesn't exercise both sides of the branch
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L386
> L761 _got_data()]
>   * the `if not self._alive` check doesn't always exercise both branches
>

> * src/allmydata/util/dictutil.py
>  * ~~~[https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L29
> L 29 subtract()]~~~
>  * ~~~[https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L230
> L 230 NumDict.item_with_largest_value()]~~~
>  * ~~~[https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L438
> L 438 ValueOrderedDict.__repr_n__]~~~
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L507
> L 507 ValueOrderedDict.__setitem__]
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L511
> L 511 ValueOrderedDict.__setitem__]
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L550
> L 550 ValueOrderedDict.__delitem__]
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L597
> L 597 ValueOrderedDict.pop]
>
> * src/allmydata/stats.py
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/3f2f7dfb05ad7f505b70323459aec8d70d2a3ab5/src/allmydata/stats.py#L68
> L64-68 LoadMonitor.get_stats()]
>   * noticed in a [https://codecov.io/gh/tahoe-lafs/tahoe-
> lafs/pull/421/changes build of PR421]
>   * get_stats() returns an average of the last 60 datapoints, but if
> there are no datapoints at all, a different branch is taken
>   * that no-datapoints branch is sometimes not covered by the tests
>
> * src/allmydata/web/status.py
>  * also from that PR421 build
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/3f2f7dfb05ad7f505b70323459aec8d70d2a3ab5/src/allmydata/web/status.py#L758
> L758 RetrieveStatusPage.render_server_timings()]
>   * sometimes the `fetch_per_server` timing data is empty, which skips
> parts of this function
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/3f2f7dfb05ad7f505b70323459aec8d70d2a3ab5/src/allmydata/web/status.py#L925
> L925 MapupdateStatusPage.render_privkey_from()]
>   * sometimes this is called when the privkey has not been fetched, so it
> skips the affirmative side of the branch
>  * [https://github.com/tahoe-lafs/tahoe-
> lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/web/status.py#L1117
> L1117 Status.childFactory()]
>   * the loop that looks for a "down-%s" name in
> h.list_all_download_statuses() doesn't always exercise both sides of the
> `if s.get_counter() == count` clause

New description:

 This ticket is to keep track of lines which sometimes get covered during
 unit tests and which sometimes do not. This probably happens because those
 lines are only exercised by "system" -style tests, which randomly hit some
 servers and not others. We should fix these by writing more narrowly-
 focussed "unit"-style tests that exercise just the function in question,
 deterministically.

 The most annoying thing about these tests is when they cause perfectly
 valid pull requests to be flagged as broken (because the test coverage
 happened to drop when they went through CI, through no fault of the PR's
 changes).

 Let's ~~strikeout~~ these items as we fix them.

 * src/allmydata/mutable/servermap.py
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/74e7ef4b98bc322da5a49c33e0706b57aab3a6d3/src/allmydata/mutable/servermap.py#L39
 L 39: UpdateStatus.add_per_server_time]~~~
   * the status object's `.timings["per_server"]` attribute is a dict-of-
 lists, and line 39 handles the case where we *don't* have to add a new
 entry to the dict. This should just be replaced by a
 `collections.defaultdict`, rather than needing any new tests.
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/74e7ef4b98bc322da5a49c33e0706b57aab3a6d3/src/allmydata/mutable/servermap.py#L923
 L 923: ServermapUpdater._try_to_validate_privkey]
   * this clause checks that the share contained an (encrypted) RSA signing
 key that matches the expectation we got from the writecap. The clause
 should succeed for normal shares, but will fail for corrupted shares.
 Apparently we have a test that corrupts some shares, but which then
 doesn't always look at them.

 * src/allmydata/mutable/publish.py
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/mutable/publish.py#L451
 L 451: Publish.publish]
   * to exercise this, we need some bad shares recorded in the servermap
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/mutable/publish.py#L496
 L 496 Publish.publish]

 * src/allmydata/immutable/upload.py
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/upload.py#L703
 Tahoe2ServerSelector._buckets_allocated()]
   * the `if s in self.homeless_shares` inside the non-Failure path doesn't
 always exercise both sides of the branch

 * src/allmydata/immutable/downloader/share.py
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L211
 Share.loop()]
   * this share-is-corrupted branch is not always exercised
  * L228-236
   * the !DataUnavailable branch is not always exercised
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L277
 L277-279 Share._do_loop()]
   * this doesn't always see a non-empty `disappointment` array, and
 sometimes doesn't raise !DataUnavailable. This is the parent cause of the
 L228-L236 non-coverage above.
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L386
 L386-389 _satisfy_offsets()]
   * the share_hashes_size test doesn't exercise both sides of the branch
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/immutable/downloader/share.py#L386
 L761 _got_data()]
   * the `if not self._alive` check doesn't always exercise both branches

 * src/allmydata/util/dictutil.py
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L29
 L 29 subtract()]~~~
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L230
 L 230 NumDict.item_with_largest_value()]~~~
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L438
 L 438 ValueOrderedDict.__repr_n__]~~~
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L507
 L 507 ValueOrderedDict.__setitem__]~~~
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L511
 L 511 ValueOrderedDict.__setitem__]~~~
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L550
 L 550 ValueOrderedDict.__delitem__]~~~
  * ~~~[https://github.com/tahoe-lafs/tahoe-
 lafs/blob/95ac5494ffaad61494e0616286894a20b87b7da8/src/allmydata/util/dictutil.py#L597
 L 597 ValueOrderedDict.pop]~~~

 * src/allmydata/stats.py
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/3f2f7dfb05ad7f505b70323459aec8d70d2a3ab5/src/allmydata/stats.py#L68
 L64-68 LoadMonitor.get_stats()]
   * noticed in a [https://codecov.io/gh/tahoe-lafs/tahoe-
 lafs/pull/421/changes build of PR421]
   * get_stats() returns an average of the last 60 datapoints, but if there
 are no datapoints at all, a different branch is taken
   * that no-datapoints branch is sometimes not covered by the tests

 * src/allmydata/web/status.py
  * also from that PR421 build
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/3f2f7dfb05ad7f505b70323459aec8d70d2a3ab5/src/allmydata/web/status.py#L758
 L758 RetrieveStatusPage.render_server_timings()]
   * sometimes the `fetch_per_server` timing data is empty, which skips
 parts of this function
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/3f2f7dfb05ad7f505b70323459aec8d70d2a3ab5/src/allmydata/web/status.py#L925
 L925 MapupdateStatusPage.render_privkey_from()]
   * sometimes this is called when the privkey has not been fetched, so it
 skips the affirmative side of the branch
  * [https://github.com/tahoe-lafs/tahoe-
 lafs/blob/d91516a5c5c7d6c6a8d040aed4302aa2b9e3bf90/src/allmydata/web/status.py#L1117
 L1117 Status.childFactory()]
   * the loop that looks for a "down-%s" name in
 h.list_all_download_statuses() doesn't always exercise both sides of the
 `if s.get_counter() == count` clause

--

Comment (by warner):

 [27348be] deleted most of dictutil.py, so those lines are no longer a
 problem

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2891#comment:8>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage