#1084 assigned defect

nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_{random_key,convergent}

Reported by: davidsarah Owned by: zooko
Priority: major Milestone: undecided
Component: code Version: 1.7β
Keywords: test upload heisenbug Cc:
Launchpad Bug:

Description

http://tahoe-lafs.org/buildbot/builders/FreeStorm%20CentOS5-i386/builds/17/steps/test/logs/stdio

[ERROR]: allmydata.test.test_system.SystemTest.test_upload_and_download_random_key

Traceback (most recent call last):
  File "/home/buildbot/tahoe-lafs/FreeStorm CentOS5-i386/build/src/allmydata/test/test_system.py", line 340, in _uploaded
    "resumption saved us some work even though we were using random keys:"
exceptions.TypeError: int argument required

Test log here.

The code around that line is here.

This did not happen on a subsequent build with only this change.

Change History (10)

comment:1 follow-up: Changed at 2010-06-16T01:25:01Z by davidsarah

self.failIf(bytes_sent < len(DATA),
            "resumption saved us some work even though we were using random keys:"
            " read %d bytes out of %d total" %
            (bytes_sent, len(DATA)))

Because there was an exception in the %d formatting of the message argument, we do not know what the value of bytes_sent was. It doesn't seem to have been None because that would have produced a different error message:

Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02)
[MSC v.1500 32 bit (Intel)] on win32
...
>>> None < 42
True
>>> "%d" % (None,)
...
TypeError: %d format: a number is required, not NoneType

(but maybe there is a difference between Python 2.4.3 and 2.6.2 here.)

comment:2 in reply to: ↑ 1 Changed at 2010-06-16T01:53:05Z by davidsarah

Replying to davidsarah:

Because there was an exception in the %d formatting of the message argument, we do not know what the value of bytes_sent was. It doesn't seem to have been None because that would have produced a different error message: ... (but maybe there is a difference between Python 2.4.3 and 2.6.2 here.)

There was; discount this argument. bytes_sent might have been None.

comment:3 Changed at 2011-07-31T21:01:45Z by davidsarah

  • Keywords centos removed
  • Summary changed from nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_random_key on CentOS builder to nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_random_key

#1273 was probably a duplicate. That failure occurred on Windows Vista, so the problem is not specific to CentOS or the CentOS builder. The error message is not exactly the same (I think an assertion was added), but seems to be due to the same type error.

comment:4 Changed at 2011-08-02T01:29:45Z by davidsarah

  • Summary changed from nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_random_key to nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_{random_key,convergent}

In http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/745/steps/test/logs/stdio , this problem happens for both test_upload_and_download_random_key and test_upload_and_download_convergent:

[FAIL]
Traceback (most recent call last):
  File "/home/arthur/buildbot/Arthur lenny c7 32bit/build/src/allmydata/test/test_system.py", line 329, in _uploaded
    self.failUnless(isinstance(bytes_sent, (int, long)), bytes_sent)
twisted.trial.unittest.FailTest: None

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
allmydata.test.test_system.SystemTest.test_upload_and_download_random_key

comment:5 Changed at 2011-09-09T05:54:30Z by zooko

A problem with similar characteristics happened just now on Ruben's Fedora buildslave:

failed: http://tahoe-lafs.org/buildbot/builders/Ruben%20Fedora/builds/864

Ended with:

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ... Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "/home/buildbot/tahoe/Ruben Fedora/build/src/allmydata/util/pollmixin.py", line 34, in _poll
    raise TimeoutError("PollMixin never saw %s return True" % check_f)
allmydata.util.pollmixin.TimeoutError: PollMixin never saw <bound method SystemTest._check_connections of <allmydata.test.test_system.SystemTest testMethod=test_upload_and_download_random_key>> return True
[ERROR]Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError: <allmydata.test.test_system.SystemTest testMethod=test_upload_and_download_random_key> (tearDown) still running at 3600.0 secs
[ERROR]Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x534e5f0 [39.4281477928s] called=0 cancelled=0 LoopingCall<60>(CPUUsageMonitor.check, *(), **{})()>
<DelayedCall 0x48e40e0 [39759.6027431s] called=0 cancelled=0 LeaseCheckingCrawler.start_slice()>
<DelayedCall 0x4bcb710 [99.8050701618s] called=0 cancelled=0 BucketCountingCrawler.start_slice()>
[ERROR]Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
Selectables:
<<class 'twisted.internet.tcp.Port'> of foolscap.pb.Listener on 42824>
[ERROR]
command timed out: 7200 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=11884.623658

rebuilt exact same version and passed: http://tahoe-lafs.org/buildbot/builders/Ruben%20Fedora/builds/865

comment:6 Changed at 2011-09-09T05:58:10Z by zooko

I remember having a brainstorm that the bytes_sent would be None in the case that the helper had not been connected to the node before the node started its upload, so I hypothesized that there is a race condition in setting up the tests, between the node connecting to the helper and the node starting its upload.

Not sure if that applies to this new issue from comment:5. Also, I thought I wrote some notes about that last time, but they are not on this ticket. Is there a different (redundant) ticket somewhere? Did I post my notes elsewhere than trac? I will investigate...

comment:7 Changed at 2011-09-09T18:21:15Z by davidsarah

comment:5 looks like #846 to me. (It's in a different test method, but the error is almost identical to the one in http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/846/tahoe-846-test-fail.txt.)

I don't see any strong evidence that comment:5 and #846 are the same bug as #1084 and #1273.

comment:8 Changed at 2011-09-28T16:22:47Z by zooko

  • Owner changed from somebody to zooko
  • Status changed from new to assigned
Note: See TracTickets for help on using tickets.