#3869 new defect

Intermittent allmydata.test.test_storage_http.GenericHTTPAPITests.test_bad_authentication failure

Reported by: exarkun Owned by:
Priority: normal Milestone: undecided
Component: unknown Version: n/a
Keywords: Cc:
Launchpad Bug:

Description (last modified by exarkun)

Sometimes when running the full test suite, especially with high test concurrency (-j8 or higher), test_bad_authentication fails like this:

[ERROR]
Traceback (most recent call last):
Failure: testtools.testresult.real._StringException: Empty attachments:
  twisted-log

Traceback (most recent call last):
  File "/nix/store/23igmvfrawyi9hzlhhx3sja6jzdxwwgq-python3-3.7.11-env/lib/python3.7/site-packages/testtools/twistedsupport/_runtest.py", line 386, in _log_user_exception
    raise e
testtools.twistedsupport._runtest.UncleanReactorError: The reactor still thinks it needs to do things. Close all connections, kill all processes and make sure all delayed calls have either fired or been cancelled:
  <DelayedCall 0x7f339af67710 [-0.39075684547424316s] called=0 cancelled=1>


allmydata.test.test_storage_http.GenericHTTPAPITests.test_bad_authentication

Change History (3)

comment:1 Changed at 2022-01-28T16:11:10Z by exarkun

  • Description modified (diff)

comment:2 Changed at 2022-01-28T18:16:09Z by exarkun

Also observed from allmydata.test.test_storage_http.GenericHTTPAPITests.test_version

comment:3 Changed at 2022-01-31T13:44:10Z by exarkun

Some observations:

  • These seem to happen on two newly configured CI jobs (the new NixOS jobs that replaced the old ones)
  • These seem to happen when concurrency is high (the old CI jobs limited trial to 3 workers, the new CI jobs limit trial to 8 workers)

I tried to investigate on Friday but I ran into a lot of bugs and missing features in trial's concurrent runner ("disttrial") feature that sucked up all the time I put in, as well as other random unrelated-but-blocking problems in Twisted's test suite.

It would be great to be able to reproduce the problem off of CI. In principle this should be doable since CI runs in a Docker image and uses reproducible-build Nix expressions. In practice maybe the problem depends on timing that comes from the particular hardware or load on the real CI runner environment ...

I think that's worth trying, at least. Failing that, we could try just cranking concurrency down on these jobs (back to 3, I guess) and see if that helps.

Note: See TracTickets for help on using tickets.