#1467 closed defect (fixed)

nondeterministic test failure in allmydata.test.test_system.SystemTest.test_filesystem because the Welcome page says "Connected to helper: no" when the test expected "yes"

Reported by: davidsarah Owned by: warner
Priority: normal Milestone: 1.9.2
Component: code Version: 1.8.2
Keywords: test heisenbug Cc:
Launchpad Bug:

Description

http://tahoe-lafs.org/buildbot/builders/Brian%20ubuntu-i386%20linode/builds/283/steps/test/logs/stdio :

[FAIL]: allmydata.test.test_system.SystemTest.test_filesystem

Traceback (most recent call last):
  File "/home/buildbot/bb-tahoe/brian-linode/build/src/allmydata/test/test_system.py", line 1102, in _got_welcome_helper
    page)
twisted.trial.unittest.FailTest: <html><head>
[...]
  <div class="connected-no">
    
    <div>Helper: <span>pb://ncieuno4yjvioe7fz5y7tqhl3nfrmawh@72.14.177.254:36780,127.0.0.1:36780/2w3qpdgcqegdd3nvhyn2eyaiphzmyog3</span></div>
    <div>Connected to helper?: <span>no</span></div>
  </div>
[...]
</body></html>

Change History (13)

comment:1 Changed at 2011-08-04T02:58:21Z by davidsarah

The test that failed is at src/allmydata/test/test_system.py@5051#L1101.

comment:2 Changed at 2011-08-04T02:59:46Z by davidsarah

  • Keywords heisenbug added
  • Summary changed from test failure in allmydata.test.test_system.SystemTest.test_filesystem because the Welcome page says "Connected to helper: no" when the test expected "yes" to nondeterministic test failure in allmydata.test.test_system.SystemTest.test_filesystem because the Welcome page says "Connected to helper: no" when the test expected "yes"

zooko: "Yes, I've seen that. It is non-deterministic."

comment:3 follow-up: Changed at 2012-02-08T18:26:10Z by zooko

test_filesystem just irreproducible FAILed again, and this time it happened on one my favorite buildslaves -- the eminently stable and well-performing "Kyle OpenBSD amd64":

https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/Kyle%20OpenBSD%20amd64/builds/22

This time there was a different error message, which might be a useful clue:

twisted.web.error.Error: 404 Not Found

Here is the part of the test.log which appears after it announces SystemTest.test_filesystem and before it announces the next test (SystemTest.test_filesystem_with_cli_in_subprocess):

2012-02-01 01:36:53.422Z [-] --> allmydata.test.test_system.SystemTest.test_filesystem <--
2012-02-01 01:36:53.434Z [-] foolscap.pb.Listener starting on 19624
2012-02-01 01:36:53.435Z [-] nevow.appserver.NevowSite starting on 19673
2012-02-01 01:36:53.435Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x21d8becf8>
2012-02-01 01:36:53.436Z [-] My pid: 30725
2012-02-01 01:36:53.438Z [-] twisted.internet.protocol.DatagramProtocol starting on 2948
2012-02-01 01:36:53.439Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x202adedd0>
2012-02-01 01:36:53.450Z [-] (Port 2948 Closed)
2012-02-01 01:36:53.452Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x202adedd0>
2012-02-01 01:36:53.614Z [-] foolscap.pb.Listener starting on 35768
2012-02-01 01:36:54.296Z [-] twisted.internet.protocol.DatagramProtocol starting on 34635
2012-02-01 01:36:54.297Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x202adedd0>
2012-02-01 01:36:54.361Z [-] foolscap.pb.Listener starting on 37699
2012-02-01 01:36:54.362Z [-] nevow.appserver.NevowSite starting on 46227
2012-02-01 01:36:54.362Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x21d2cc488>
2012-02-01 01:36:54.363Z [-] My pid: 30725
2012-02-01 01:36:54.363Z [-] (Port 34635 Closed)
2012-02-01 01:36:54.364Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x202adedd0>
2012-02-01 01:36:54.365Z [-] twisted.internet.protocol.DatagramProtocol starting on 19862
2012-02-01 01:36:54.365Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21d937f80>
2012-02-01 01:36:54.366Z [-] (Port 19862 Closed)
2012-02-01 01:36:54.366Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21d937f80>
2012-02-01 01:36:54.439Z [-] foolscap.pb.Listener starting on 7790
2012-02-01 01:36:54.440Z [-] My pid: 30725
2012-02-01 01:36:54.487Z [-] foolscap.pb.Listener starting on 35570
2012-02-01 01:36:54.488Z [-] My pid: 30725
2012-02-01 01:36:54.537Z [-] foolscap.pb.Listener starting on 7049
2012-02-01 01:36:54.537Z [-] nevow.appserver.NevowSite starting on 21595
2012-02-01 01:36:54.538Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x2085d2ef0>
2012-02-01 01:36:54.538Z [-] My pid: 30725
2012-02-01 01:36:54.586Z [-] foolscap.pb.Listener starting on 24152
2012-02-01 01:36:54.587Z [-] My pid: 30725
2012-02-01 01:36:54.589Z [-] twisted.internet.protocol.DatagramProtocol starting on 37407
2012-02-01 01:36:54.590Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x20e547b90>
2012-02-01 01:36:54.591Z [-] twisted.internet.protocol.DatagramProtocol starting on 30005
2012-02-01 01:36:54.592Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21dc17680>
2012-02-01 01:36:54.639Z [-] twisted.internet.protocol.DatagramProtocol starting on 6706
2012-02-01 01:36:54.639Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21e3a4098>
2012-02-01 01:36:54.662Z [-] twisted.internet.protocol.DatagramProtocol starting on 30125
2012-02-01 01:36:54.663Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21dc2bc68>
2012-02-01 01:36:54.668Z [-] (Port 37407 Closed)
2012-02-01 01:36:54.669Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x20e547b90>
2012-02-01 01:36:54.670Z [-] (Port 30005 Closed)
2012-02-01 01:36:54.670Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21dc17680>
2012-02-01 01:36:54.672Z [-] (Port 6706 Closed)
2012-02-01 01:36:54.672Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21e3a4098>
2012-02-01 01:36:54.673Z [-] (Port 30125 Closed)
2012-02-01 01:36:54.673Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21dc2bc68>
2012-02-01 01:36:55.806Z [-] Starting factory <HTTPClientFactory: http://127.0.0.1:19673/>
2012-02-01 01:36:55.877Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:19673/?t=json>
2012-02-01 01:36:55.877Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:19673/>
2012-02-01 01:36:55.908Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:19673/?t=json>
2012-02-01 01:36:57.423Z [-] (Port 46227 Closed)
2012-02-01 01:36:57.423Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x21d2cc488>
2012-02-01 01:36:57.424Z [-] (Port 37699 Closed)
2012-02-01 01:36:58.484Z [-] foolscap.pb.Listener starting on 37699
2012-02-01 01:36:58.485Z [-] nevow.appserver.NevowSite starting on 6309
2012-02-01 01:36:58.485Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x21d930ea8>
2012-02-01 01:36:58.486Z [-] My pid: 30725
2012-02-01 01:36:58.489Z [-] twisted.internet.protocol.DatagramProtocol starting on 34314
2012-02-01 01:36:58.490Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21d910b90>
2012-02-01 01:36:58.494Z [-] (Port 34314 Closed)
2012-02-01 01:36:58.494Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x21d910b90>
2012-02-01 01:37:05.172Z [-] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/>
2012-02-01 01:37:05.211Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:21595/>
2012-02-01 01:37:05.212Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/>
2012-02-01 01:37:05.227Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa>
2012-02-01 01:37:05.228Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:21595/>
2012-02-01 01:37:05.421Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir1>
2012-02-01 01:37:05.422Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI%3ADIR2%3Ayodtx22564zgatzzowv765umse%3Agnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/>
2012-02-01 01:37:05.766Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir1/mydata567>
2012-02-01 01:37:05.767Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI%3ADIR2%3Ayodtx22564zgatzzowv765umse%3Agnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir1/>
2012-02-01 01:37:05.971Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:CHK:gmdbvycnymujkjlj646vkngaw4:zyx5xzvredovnxvio2w6iavlikt7i4ie5yh7okbavsf6zo7c6d5q:3:10:112?filename=mydata567>
2012-02-01 01:37:05.972Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir1/mydata567>
2012-02-01 01:37:06.006Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri?uri=URI:CHK:gmdbvycnymujkjlj646vkngaw4:zyx5xzvredovnxvio2w6iavlikt7i4ie5yh7okbavsf6zo7c6d5q:3:10:112>
2012-02-01 01:37:06.007Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:CHK:gmdbvycnymujkjlj646vkngaw4:zyx5xzvredovnxvio2w6iavlikt7i4ie5yh7okbavsf6zo7c6d5q:3:10:112?filename=mydata567>
2012-02-01 01:37:06.060Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:CHK:gmdbvycnymujkjlj646vkngawy:zyx5xzvredovnxvio2w6iavlikt7i4ie5yh7okbavsf6zo7c6d5q:3:10:112?filename=mydata567>
2012-02-01 01:37:06.061Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI%3ACHK%3Agmdbvycnymujkjlj646vkngaw4%3Azyx5xzvredovnxvio2w6iavlikt7i4ie5yh7okbavsf6zo7c6d5q%3A3%3A10%3A112>
2012-02-01 01:37:06.076Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:06.077Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:CHK:gmdbvycnymujkjlj646vkngawy:zyx5xzvredovnxvio2w6iavlikt7i4ie5yh7okbavsf6zo7c6d5q:3:10:112?filename=mydata567>
2012-02-01 01:37:06.567Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:06.567Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:06.703Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/big.txt>
2012-02-01 01:37:06.707Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:08.019Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/big.txt>
2012-02-01 01:37:08.019Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/big.txt>
2012-02-01 01:37:08.734Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:08.736Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/big.txt>
2012-02-01 01:37:09.054Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:09.055Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:09.191Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/uri>
2012-02-01 01:37:09.192Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri/URI:DIR2:yodtx22564zgatzzowv765umse:gnlv5ryhwfjvzgqcedx5hlihwrbugh6bd5uiu3vupnsvhqqvq2qa/subdir3/new.txt>
2012-02-01 01:37:09.583Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:21595/uri>
2012-02-01 01:37:09.584Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/uri>
2012-02-01 01:37:10.382Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/status>
2012-02-01 01:37:10.383Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:21595/uri>
2012-02-01 01:37:10.433Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/status/up-229>
2012-02-01 01:37:10.433Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/status/>
2012-02-01 01:37:10.454Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/status/down-135>
2012-02-01 01:37:10.454Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/status/up-229>
2012-02-01 01:37:10.644Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/status/mapupdate-2143>
2012-02-01 01:37:10.645Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/status/down-135>
2012-02-01 01:37:10.659Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/status/publish-890>
2012-02-01 01:37:10.660Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/status/mapupdate-2143>
2012-02-01 01:37:10.674Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:6309/status/retrieve-1440>
2012-02-01 01:37:10.675Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/status/publish-890>
2012-02-01 01:37:10.687Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:6309/status/retrieve-1440>
2012-02-01 01:37:10.687Z [-] (Port 6309 Closed)
2012-02-01 01:37:10.688Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x21d930ea8>
2012-02-01 01:37:10.689Z [-] (Port 37699 Closed)
2012-02-01 01:37:10.690Z [-] (Port 24152 Closed)
2012-02-01 01:37:10.691Z [-] (Port 21595 Closed)
2012-02-01 01:37:10.691Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x2085d2ef0>
2012-02-01 01:37:10.692Z [-] (Port 7049 Closed)
2012-02-01 01:37:10.693Z [-] (Port 35570 Closed)
2012-02-01 01:37:10.693Z [-] (Port 7790 Closed)
2012-02-01 01:37:10.694Z [-] (Port 35768 Closed)
2012-02-01 01:37:10.695Z [-] (Port 19673 Closed)
2012-02-01 01:37:10.695Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x21d8becf8>
2012-02-01 01:37:10.696Z [-] (Port 19624 Closed)
2012-02-01 01:37:10.708Z [-] Main loop terminated.

comment:4 in reply to: ↑ 3 Changed at 2012-02-08T22:36:40Z by davidsarah

Replying to zooko:

test_filesystem just irreproducible FAILed again, and this time it happened on one my favorite buildslaves -- the eminently stable and well-performing "Kyle OpenBSD amd64":

https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/Kyle%20OpenBSD%20amd64/builds/22

This time there was a different error message, which might be a useful clue:

twisted.web.error.Error: 404 Not Found

Since there's no traceback, I don't think we can reliably conclude that this is the same bug. test_filesystem is a huge test, and there could easily be more than one nondeterministic bug in it. Let's keep this ticket for the check that we know fails at src/allmydata/test/test_system.py?rev=5051#L1101.

comment:5 Changed at 2012-06-13T16:17:11Z by davidsarah

  • Owner changed from somebody to warner

warner recently changed something relating to race conditions in these tests, I think.

comment:6 Changed at 2012-06-13T16:17:24Z by davidsarah

  • Priority changed from major to normal

comment:7 Changed at 2012-06-14T18:43:37Z by warner

Oh, I think I see the problem. test/common.py (in bounce_client(), when it polls with wait_for_connections()) isn't looking for a Helper connection, just for the introducer and server connections. _check_connections() on common.py line 676 should also look for a helper connection (if a helper is configured).

(the change in 51a5aaa6 should fix a couple of race issues in test_system, but not this particular one)

comment:8 Changed at 2012-06-14T19:21:40Z by Brian Warner <warner@…>

  • Resolution set to fixed
  • Status changed from new to closed

In b2dcbbb62dc759d4:

test/common.py: fix race condition waiting for the helper connection

The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest?.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.

The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.

Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.

Closes #1467

comment:9 Changed at 2012-06-14T19:24:11Z by Brian Warner <warner@…>

In b2dcbbb62dc759d4:

test/common.py: fix race condition waiting for the helper connection

The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest?.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.

The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.

Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.

Closes #1467

comment:10 Changed at 2012-06-14T19:25:19Z by warner

  • Milestone changed from undecided to 1.10.0

comment:11 Changed at 2012-06-14T20:39:45Z by Brian Warner <warner@…>

In 5512/1.9.2:

test/common.py: fix race condition waiting for the helper connection

The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest?.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.

The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.

Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.

Closes #1467

comment:12 Changed at 2012-06-14T20:40:52Z by davidsarah

  • Milestone changed from 1.10.0 to 1.9.2

Reviewed and applied to 1.9.2.

comment:13 Changed at 2012-07-10T20:04:57Z by Brian Warner <warner@…>

In 5861/cloud-backend:

test/common.py: fix race condition waiting for the helper connection

The wait_for_connections() method, which is used at the start of
test_system to make sure that all the clients are connected to all the
servers, did not also wait for clients to be connected to their Helpers.
Every once in a while, the helper connection would take a bit longer,
and then
test_system.SystemTest?.test_filesystem._test_web._got_welcome_helper
would fail, because we'd check for a helper connection before it was
ready.

The fix is to modify wait_for_connections's polling predicate to look
for helper connections (if configured) as well as the regular
introducer- and server- connections.

Tested by temporarily adding a large (30s) delay to the connectTo() call
in Uploader.startService, simulating a long helper
connection-establishment delay. This makes the test fail consistently.
Then I fixed wait_for_connections(), and the test passed (slowly). Then
I removed the delay.

Closes #1467

Note: See TracTickets for help on using tickets.