#1197 closed defect (fixed)

nondeterministic failure of allmydata.test.test_download.DownloadTest.test_lost_servers with NotEnoughSharesError

Reported by: davidsarah Owned by: zooko
Priority: major Milestone: 1.8.0
Component: code-network Version: 1.8β
Keywords: reviewed immutable download Cc:
Launchpad Bug:

Description (last modified by davidsarah)

http://tahoe-lafs.org/buildbot/builders/FreeStorm%20WinXP-x86%20py2.6/builds/273/steps/test/logs/stdio :

[ERROR]: allmydata.test.test_download.DownloadTest.test_lost_servers

Traceback (most recent call last):
Failure: allmydata.interfaces.NotEnoughSharesError: ran out of
shares: complete=sh4 pending=Share(sh7-on-k6vb2) overdue= unused=
need 3. Last failure: [Failure instance: Traceback (failure with no
frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException
around '[Failure instance: Traceback: <class
'allmydata.test.no_network.IntentionalError'>: I was asked to break
c:\python26\lib\site-packages\twisted\internet\base.py:778:runUntilCurrent
build\bdist.win32\egg\foolscap\eventual.py:26:_turn
c:\python26\lib\site-packages\twisted\internet\defer.py:280:callback
c:\python26\lib\site-packages\twisted\internet\defer.py:354:_startRunCallbacks
--- <exception caught here> ---
c:\python26\lib\site-packages\twisted\internet\defer.py:371:_runCallbacks
c:\buildbot_tahoe\freestorm_winxp-x86_py2.6\build\src\allmydata\test\no_network.py:81:<lambda>
c:\buildbot_tahoe\freestorm_winxp-x86_py2.6\build\src\allmydata\test\no_network.py:68:_call
]'>
]

This test succeeds on other builders, and in the next build of the same source.

Change History (9)

comment:1 Changed at 2010-09-11T02:23:25Z by davidsarah

  • Description modified (diff)

comment:2 Changed at 2010-09-11T03:27:35Z by zooko

I have seen this a couple of times now. I first saw it after applying 0475bd8e27ae8307. I suspect 0475bd8e27ae8307 causes it. :-(

comment:3 Changed at 2010-09-11T04:22:39Z by zooko

  • Owner set to zooko
  • Status changed from new to assigned

comment:4 Changed at 2010-09-11T06:18:13Z by zooko

Ugh, I spent a few hours just now trying to make allmydata.test.test_download.DownloadTest.test_lost_servers deterministic and was not able to do so. Now that I'm writing this, I think maybe I don't even want allmydata.test.test_download.DownloadTest.test_lost_servers to be deterministic. It is basically a "system test". Maybe it is okay to leave it with some non-determinism (which I assume comes from iterating over a dict or a set of servers) in order to show up issues like this one, as long as also write a deterministic, narrowed test for each such issue that shows up.

Okay I'll try that next...

comment:5 Changed at 2010-09-11T07:19:44Z by zooko

Here is the last part of _trial_temp/test.log (created with FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1 PYTHONPATH=src/ trial --rterror allmydata.test.test_download.DownloadTest.test_lost_servers) of a run in which the download succeeded:

2010-09-11 00:00:19-0600 [-] imm Node(54cbd6tx).get_segment(1)
2010-09-11 00:00:19-0600 [-] error requesting 60+24 from 6jdsp for si 54cbd6tx
2010-09-11 00:00:19-0600 [-] abandoning Share(sh0-on-6jdsp)
2010-09-11 00:00:19-0600 [-] error requesting 60+24 from ysbz4 for si 54cbd6tx
2010-09-11 00:00:19-0600 [-] abandoning Share(sh1-on-ysbz4)
2010-09-11 00:00:19-0600 [-] error requesting 60+24 from k6vb2 for si 54cbd6tx
2010-09-11 00:00:19-0600 [-] abandoning Share(sh2-on-k6vb2)
2010-09-11 00:00:19-0600 [-] error requesting 0+1024 from alg6u for si 54cbd6tx
2010-09-11 00:00:19-0600 [-] abandoning Share(sh3-on-alg6u)
2010-09-11 00:00:19-0600 [-] got error from [xgru5adv]
2010-09-11 00:00:19-0600 [-] storage: get_buckets 54cbd6txyhbfqrdoclhpinsux4
2010-09-11 00:00:19-0600 [-] storage: get_buckets 54cbd6txyhbfqrdoclhpinsux4
2010-09-11 00:00:19-0600 [-] got error from [rvsry4kn]
2010-09-11 00:00:19-0600 [-] storage: get_buckets 54cbd6txyhbfqrdoclhpinsux4
2010-09-11 00:00:19-0600 [-] got error from [fob7vqgd]
2010-09-11 00:00:19-0600 [-] delivering segment(1)
2010-09-11 00:00:19-0600 [-] Segmentation got data: want [72-310), given [72-144), for segnum=1
2010-09-11 00:00:19-0600 [-] imm Node(54cbd6tx).get_segment(2)
2010-09-11 00:00:19-0600 [-] delivering segment(2)

Here is the comparable section of a run in which the download failed:

2010-09-11 00:05:55-0600 [-] imm Node(x36qlfft).get_segment(1)
2010-09-11 00:05:55-0600 [-] error requesting 60+24 from alg6u for si x36qlfft
2010-09-11 00:05:55-0600 [-] abandoning Share(sh0-on-alg6u)
2010-09-11 00:05:55-0600 [-] error requesting 60+24 from egjce for si x36qlfft
2010-09-11 00:05:55-0600 [-] abandoning Share(sh1-on-egjce)
2010-09-11 00:05:55-0600 [-] error requesting 60+24 from k6vb2 for si x36qlfft
2010-09-11 00:05:55-0600 [-] abandoning Share(sh2-on-k6vb2)
2010-09-11 00:05:55-0600 [-] error requesting 0+1024 from 7r4gd for si x36qlfft
2010-09-11 00:05:55-0600 [-] abandoning Share(sh3-on-7r4gd)
2010-09-11 00:05:55-0600 [-] storage: get_buckets x36qlfft7g5c7raim6y7yb25ju
2010-09-11 00:05:55-0600 [-] storage: get_buckets x36qlfft7g5c7raim6y7yb25ju
2010-09-11 00:05:55-0600 [-] got error from [fob7vqgd]
2010-09-11 00:05:55-0600 [-] got error from [rvsry4kn]
2010-09-11 00:05:55-0600 [-] got error from [ysbz4st7]
2010-09-11 00:05:55-0600 [-] got error from [xgru5adv]
2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever
2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever
2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever
2010-09-11 00:05:55-0600 [-] ShareFinder.loop: no_more_shares, ever
2010-09-11 00:05:55-0600 [-] ran out of shares: complete= pending=Share(sh4-on-6jdsp),Share(sh5-on-b3llg) overdue= unused= need 3. Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <class 'allmydata.test.no_network.IntentionalError'>: I was asked to break
        /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/base.py:789:runUntilCurrent
        /Library/Python/2.6/site-packages/foolscap-0.5.1-py2.6.egg/foolscap/eventual.py:26:_turn
        /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:351:callback
        /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:450:_startRunCallbacks
        --- <exception caught here> ---
        /Library/Python/2.6/site-packages/Twisted-10.1.0_r29874-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:468:_runCallbacks
        /Users/zooko/playground/tahoe-lafs/trunk/src/allmydata/test/no_network.py:81:<lambda>
        /Users/zooko/playground/tahoe-lafs/trunk/src/allmydata/test/no_network.py:68:_call
        ]'>
        ]

comment:6 Changed at 2010-09-12T00:52:32Z by david-sarah@…

In 6c4ba62c5d3a7624:

test_download.py: simplify and possibly fix test_lost_servers; refs #1197

comment:7 Changed at 2010-09-12T01:42:31Z by davidsarah

  • Keywords review-needed added; heisenbug removed
  • Status changed from assigned to new

That seems to have worked.

comment:8 Changed at 2010-09-12T07:25:32Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

reviewed. Thanks!! :-)

comment:9 Changed at 2010-09-12T07:25:48Z by zooko

  • Keywords reviewed added; review-needed removed
Note: See TracTickets for help on using tickets.