[tahoe-lafs-trac-stream] [Tahoe-LAFS] #4126: Unit test suite inconsistently failing on CircleCI (was: CI test_system fails inconsistently)
Tahoe-LAFS
trac at tahoe-lafs.org
Fri Dec 6 00:17:26 UTC 2024
#4126: Unit test suite inconsistently failing on CircleCI
------------------------------------+---------------------------
Reporter: hacklschorsch | Owner: hacklschorsch
Type: defect | Status: assigned
Priority: normal | Milestone: undecided
Component: dev-infrastructure | Version: n/a
Resolution: | Keywords: ci
Launchpad Bug: |
------------------------------------+---------------------------
Description changed by btlogy:
Old description:
> CI reactors under `test.test_system` on CircleCI fail inconsistently ONLY
> in the Tahoe-lafs Circle CI org.
> Cannot reproduce locally on Nixos nor on GitHub CI (inside similar docker
> images).
>
> Possible root cause discussed in https://github.com/tahoe-lafs/tahoe-
> lafs/pull/1381#issuecomment-2476885548 meejah writes:
>
> > The unclean-reactor errors may be simply a downstream symptom of the
> real errors that also happen in that run (e.g. several tests time out).
>
> My own tests suggest that indeed, raising the SystemTests timeout make
> [https://github.com/tahoe-lafs/tahoe-
> lafs/pull/1381#issuecomment-2444698978 a couple of flaky tests] much more
> stable:
>
> || Failure count || Test name ||
> || 1 || allmydata.test.test_system.HTTPSystemTest.test_mutable_mdmf
> ||
> || 3 || allmydata.test.test_system.HTTPSystemTest.test_mutable_sdmf
> ||
> || 30 ||
> allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_convergent
> ||
> || 11 ||
> allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_random_key
> ||
>
> This ticket is similar but not equal to ticket:4085, ticket:4022,
> ticket:2994 .
New description:
1. For at least 3 months (likely more, but can no longer see older logs)
we can see `test_verify_one_bad_encprivkey` spuriously failing in the
CircleCI logs (except for master which was broken #4098)
2. More recently, test_system.HTTPSystemTest is failing more often:
CI reactors under `test.test_system` on CircleCI fail inconsistently in
the Tahoe-lafs AND LeastAuthority orgs (not the same plan).
And this cannot be reproduced locally on Nixos nor on GitHub CI (inside
similar docker images).
Possible root cause discussed in https://github.com/tahoe-lafs/tahoe-
lafs/pull/1381#issuecomment-2476885548 meejah writes:
> The unclean-reactor errors may be simply a downstream symptom of the
real errors that also happen in that run (e.g. several tests time out).
My own tests suggest that indeed, raising the SystemTests timeout make
[https://github.com/tahoe-lafs/tahoe-
lafs/pull/1381#issuecomment-2444698978 a couple of flaky tests] much more
stable:
|| Failure count || Test name ||
|| 1 || allmydata.test.test_system.HTTPSystemTest.test_mutable_mdmf ||
|| 3 || allmydata.test.test_system.HTTPSystemTest.test_mutable_sdmf ||
|| 30 ||
allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_convergent
||
|| 11 ||
allmydata.test.test_system.HTTPSystemTest.test_upload_and_download_random_key
||
This ticket is similar but not equal to ticket:4085, ticket:4022,
ticket:2994 .
--
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/4126#comment:7>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list