#1197 closed defect (fixed)
nondeterministic failure of allmydata.test.test_download.DownloadTest.test_lost_servers with NotEnoughSharesError
Reported by: | davidsarah | Owned by: | zooko |
---|---|---|---|
Priority: | major | Milestone: | 1.8.0 |
Component: | code-network | Version: | 1.8β |
Keywords: | reviewed immutable download | Cc: | |
Launchpad Bug: |
Description (last modified by davidsarah)
[ERROR]: allmydata.test.test_download.DownloadTest.test_lost_servers Traceback (most recent call last): Failure: allmydata.interfaces.NotEnoughSharesError: ran out of shares: complete=sh4 pending=Share(sh7-on-k6vb2) overdue= unused= need 3. Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <class 'allmydata.test.no_network.IntentionalError'>: I was asked to break c:\python26\lib\site-packages\twisted\internet\base.py:778:runUntilCurrent build\bdist.win32\egg\foolscap\eventual.py:26:_turn c:\python26\lib\site-packages\twisted\internet\defer.py:280:callback c:\python26\lib\site-packages\twisted\internet\defer.py:354:_startRunCallbacks --- <exception caught here> --- c:\python26\lib\site-packages\twisted\internet\defer.py:371:_runCallbacks c:\buildbot_tahoe\freestorm_winxp-x86_py2.6\build\src\allmydata\test\no_network.py:81:<lambda> c:\buildbot_tahoe\freestorm_winxp-x86_py2.6\build\src\allmydata\test\no_network.py:68:_call ]'> ]
This test succeeds on other builders, and in the next build of the same source.
Change History (9)
comment:1 Changed at 2010-09-11T02:23:25Z by davidsarah
- Description modified (diff)
comment:2 Changed at 2010-09-11T03:27:35Z by zooko
comment:3 Changed at 2010-09-11T04:22:39Z by zooko
- Owner set to zooko
- Status changed from new to assigned
comment:4 Changed at 2010-09-11T06:18:13Z by zooko
Ugh, I spent a few hours just now trying to make allmydata.test.test_download.DownloadTest.test_lost_servers deterministic and was not able to do so. Now that I'm writing this, I think maybe I don't even want allmydata.test.test_download.DownloadTest.test_lost_servers to be deterministic. It is basically a "system test". Maybe it is okay to leave it with some non-determinism (which I assume comes from iterating over a dict or a set of servers) in order to show up issues like this one, as long as also write a deterministic, narrowed test for each such issue that shows up.
Okay I'll try that next...
comment:5 Changed at 2010-09-11T07:19:44Z by zooko
Here is the last part of _trial_temp/test.log (created with FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1 PYTHONPATH=src/ trial --rterror allmydata.test.test_download.DownloadTest.test_lost_servers) of a run in which the download succeeded:
2010-09-11 00:00:19-0600 [-] imm Node(54cbd6tx).get_segment(1) 2010-09-11 00:00:19-0600 [-] error requesting 60+24 from 6jdsp for si 54cbd6tx 2010-09-11 00:00:19-0600 [-] abandoning Share(sh0-on-6jdsp) 2010-09-11 00:00:19-0600 [-] error requesting 60+24 from ysbz4 for si 54cbd6tx 2010-09-11 00:00:19-0600 [-] abandoning Share(sh1-on-ysbz4) 2010-09-11 00:00:19-0600 [-] error requesting 60+24 from k6vb2 for si 54cbd6tx 2010-09-11 00:00:19-0600 [-] abandoning Share(sh2-on-k6vb2) 2010-09-11 00:00:19-0600 [-] error requesting 0+1024 from alg6u for si 54cbd6tx 2010-09-11 00:00:19-0600 [-] abandoning Share(sh3-on-alg6u) 2010-09-11 00:00:19-0600 [-] got error from [xgru5adv] 2010-09-11 00:00:19-0600 [-] storage: get_buckets 54cbd6txyhbfqrdoclhpinsux4 2010-09-11 00:00:19-0600 [-] storage: get_buckets 54cbd6txyhbfqrdoclhpinsux4 2010-09-11 00:00:19-0600 [-] got error from [rvsry4kn] 2010-09-11 00:00:19-0600 [-] storage: get_buckets 54cbd6txyhbfqrdoclhpinsux4 2010-09-11 00:00:19-0600 [-] got error from [fob7vqgd] 2010-09-11 00:00:19-0600 [-] delivering segment(1) 2010-09-11 00:00:19-0600 [-] Segmentation got data: want [72-310), given [72-144), for segnum=1 2010-09-11 00:00:19-0600 [-] imm Node(54cbd6tx).get_segment(2) 2010-09-11 00:00:19-0600 [-] delivering segment(2)
Here is the comparable section of a run in which the download failed:
2010-09-11 00:05:55-0600 [-] imm Node(x36qlfft).get_segment(1) 2010-09-11 00:05:55-0600 [-] error requesting 60+24 from alg6u for si x36qlfft 2010-09-11 00:05:55-0600 [-] abandoning Share(sh0-on-alg6u) 2010-09-11 00:05:55-0600 [-] error requesting 60+24 from egjce for si x36qlfft 2010-09-11 00:05:55-0600 [-] abandoning Share(sh1-on-egjce) 2010-09-11 00:05:55-0600 [-] error requesting 60+24 from k6vb2 for si x36qlfft 2010-09-11 00:05:55-0600 [-] abandoning Share(sh2-on-k6vb2) 2010-09-11 00:05:55-0600 [-] error requesting 0+1024 from 7r4gd for si x36qlfft 2010-09-11 00:05:55-0600 [-] abandoning Share(sh3-on-7r4gd) 2010-09-11 00:05:55-0600 [-] storage: get_buckets x36qlfft7g5c7raim6y7yb25ju 2010-09-11 00:05:55-0600 [-] storage: get_buckets x36qlfft7g5c7raim6y7yb25ju 2010-09-11 00:05:55-0600 [-] got error from [fob7vqgd] 2010-09-11 00:05:55-0600 [-] got error from [rvsry4kn] 2010-09-11 00:05:55-0600 [-] got error from [ysbz4st7] 2010-09-11 00:05:55-0600 [-] got error from [xgru5adv] 2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever 2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever 2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever 2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever 2010-09-11 00:05:55-0600 [-] ran out of shares: complete= pending=Share(sh4-on-6jdsp),Share(sh5-on-b3llg) overdue= unused= need 3. Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <class 'allmydata.test.no_network.IntentionalError'>: I was asked to break /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/base.py:789:runUntilCurrent /Library/Python/2.6/site-packages/foolscap-0.5.1-py2.6.egg/foolscap/eventual.py:26:_turn /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:351:callback /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:450:_startRunCallbacks --- <exception caught here> --- /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:468:_runCallbacks /Users/zooko/playground/tahoe-lafs/trunk/src/allmydata/test/no_network.py:81:<lambda> /Users/zooko/playground/tahoe-lafs/trunk/src/allmydata/test/no_network.py:68:_call ]'> ]
comment:6 Changed at 2010-09-12T00:52:32Z by david-sarah@…
In 6c4ba62c5d3a7624:
comment:7 Changed at 2010-09-12T01:42:31Z by davidsarah
- Keywords review-needed added; heisenbug removed
- Status changed from assigned to new
That seems to have worked.
comment:8 Changed at 2010-09-12T07:25:32Z by zooko
- Resolution set to fixed
- Status changed from new to closed
reviewed. Thanks!! :-)
comment:9 Changed at 2010-09-12T07:25:48Z by zooko
- Keywords reviewed added; review-needed removed
I have seen this a couple of times now. I first saw it after applying 0475bd8e27ae8307. I suspect 0475bd8e27ae8307 causes it. :-(