[tahoe-lafs-trac-stream] [Tahoe-LAFS] #4126: Unit test suite inconsistently failing on CircleCI

Tahoe-LAFS trac at tahoe-lafs.org
Sat Dec 7 10:16:33 UTC 2024


#4126: Unit test suite inconsistently failing on CircleCI
------------------------------------+---------------------------
     Reporter:  hacklschorsch       |      Owner:  hacklschorsch
         Type:  defect              |     Status:  assigned
     Priority:  normal              |  Milestone:  undecided
    Component:  dev-infrastructure  |    Version:  n/a
   Resolution:                      |   Keywords:  ci
Launchpad Bug:                      |
------------------------------------+---------------------------
Description changed by btlogy:

Old description:

> 1. For at least 3 months (likely more, but can no longer see older logs)
> we can see `test_verify_one_bad_encprivkey` spuriously failing in the
> CircleCI logs (except for master which was broken #4098)
> 2. More recently, `test_system.HTTPSystemTest` is failing more often:
>
> CI reactors under `test.test_system` on CircleCI fail inconsistently in
> the Tahoe-lafs AND LeastAuthority orgs (not the same plan).
> And this cannot be reproduced locally on Nixos nor on GitHub CI (inside
> similar docker images).
>
> Possible root cause discussed in https://github.com/tahoe-lafs/tahoe-
> lafs/pull/1381#issuecomment-2476885548 meejah writes:
>
> > The unclean-reactor errors may be simply a downstream symptom of the
> real errors that also happen in that run (e.g. several tests time out).
>
> My own tests suggest that indeed, raising the SystemTests timeout make
> [https://github.com/tahoe-lafs/tahoe-
> lafs/pull/1381#issuecomment-2444698978 a couple of flaky tests] much more
> stable:
>
> || Failure count || Test name ||
> ||     1 || allmydata.test.test_system.HTTPSystemTest.test_mutable_mdmf
> ||
> ||     3 || allmydata.test.test_system.HTTPSystemTest.test_mutable_sdmf
> ||
> ||    30 ||
> allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_convergent
> ||
> ||    11 ||
> allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_random_key
> ||
>
> This ticket is similar but not equal to ticket:4085, ticket:4022,
> ticket:2994 .

New description:

 1. For at least 3 months (likely more, but can no longer see older logs)
 we can see `test_verify_one_bad_encprivkey` spuriously failing in the
 CircleCI logs (except for master which was broken #4098)
 2. More recently, `test_system.HTTPSystemTest` is failing more often:

 CI reactors under `test.test_system` on CircleCI fail inconsistently in
 the Tahoe-lafs AND LeastAuthority orgs (not the same plan).
 And this cannot be reproduced locally on Nixos nor on GitHub CI (inside
 similar docker images).

 Possible root cause discussed in https://github.com/tahoe-lafs/tahoe-
 lafs/pull/1381#issuecomment-2476885548 meejah writes:

 > The unclean-reactor errors may be simply a downstream symptom of the
 real errors that also happen in that run (e.g. several tests time out).

 My own tests suggest that indeed, raising the SystemTests timeout make
 [https://github.com/tahoe-lafs/tahoe-
 lafs/pull/1381#issuecomment-2444698978 a couple of flaky tests] much more
 stable:

 || Failure count || Test name ||
 ||     1 || allmydata.test.test_system.HTTPSystemTest.test_mutable_mdmf ||
 ||     3 || allmydata.test.test_system.HTTPSystemTest.test_mutable_sdmf ||
 ||    30 ||
 allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_convergent
 ||
 ||    11 ||
 allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_random_key
 ||

 This ticket is similar but not equal to ticket:4085, ticket:4022,
 ticket:2994 .

 NOTE: there is an ongoing collaborative effort to get to the bottom of
 this issue using this tmp doc:
 https://cryptpad.fr/code/#/2/code/view/ApS8GZH4OfKbR71RdkRa1LLClaJk88emHeW0yvwhHkk/

--

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/4126#comment:12>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list