#1648 closed defect (fixed)

assertion failure 'assert len(self._active_readers) >= self._required_shares' in mutable retrieve

Reported by: davidsarah Owned by: kevan
Priority: major Milestone: 1.9.2
Component: code-mutable Version: 1.9.0
Keywords: assertion error retrieve mutable regression Cc: stucard, zooko
Launchpad Bug:

Description (last modified by zooko)

Reported by Stuart Card:

c:\allmydata-tahoe-1.9.0-rebuild\src\allmydata\mutable\retrieve.py, line 502 in _activate_enough_peers
500    # XXX: don't just drop the Deferred. We need error-reporting
501    # but not flow-control here.
502    assert len(self._active_readers) >= self._required_shares

_active_readers	   List instance @ 0x3510850
_required_shares   1

This was seen with a 1.9.0 client connected to an LAE storage server using an S3 backend (on the ticket999-S3-backend branch), with 1/1/1 encoding parameters. It seems as though the client shouldn't have reported an error in this way even if it was caused by the server.

The assertion failure, at retrieve.py line 507, is in code that changed in 1.9.0, so it might be a regression.

Attachments (1)

CheckOpsAllBoxesChecked-Exception.htm (7.1 KB) - added by davidsarah at 2011-12-31T01:04:12Z.
HTML traceback

Download all attachments as: .zip

Change History (8)

comment:1 Changed at 2011-12-31T01:00:59Z by davidsarah

  • Cc stucard added

Changed at 2011-12-31T01:04:12Z by davidsarah

HTML traceback

comment:2 Changed at 2011-12-31T03:14:42Z by davidsarah

On 31/12/11 01:42, Stuart W. Card wrote:

On 12/30/2011 7:24 PM, David-Sarah wrote:

On 30/12/11 22:10, Stuart W. Card wrote:

... The second actual error was while attempting (for my first time) to run from the UI the checker operations. As the directory and its contents are small, I didn't worry about "expensive", and checked all 4 boxes. It popped a Python exception, in Foolscap/Twisted.

Again, the HTML file is in the attached ZIP archive.

It looks from the traceback in CheckOpsAllBoxesChecked-Exception.htm as though that error is either caused by a client-side problem that isn't specific to the S3 backend, or is being misreported by the client. I've filed <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1648> for it.

Does the error always happen for that directory? If yes, does it happen with Tahoe-LAFS 1.8.3?

I just tried it twice more, it appeared to work fine: gave me a bunch of JSON that I don't yet know enough about Tahoe internals to grok.

I never got 1.8.3 working on this particular box. [...]

comment:3 Changed at 2011-12-31T06:35:28Z by zooko

  • Cc zooko added
  • Description modified (diff)
  • Milestone changed from undecided to 1.9.1

comment:4 Changed at 2012-01-04T02:08:24Z by kevan

  • Owner set to kevan

I'll look into this.

comment:5 Changed at 2012-01-08T20:40:05Z by kevan

  • Keywords review-needed added

I made a pull request on GitHub. From the description there:

The Retrieve class gets used both for downloading, validating, decoding and decrypting remote shares (writing the plaintext to a consumer) and for validating remote shares, since the steps to validate the shares of a file are a subset of the steps to assemble that file into plaintext. There were several assertions throughout the code path used by the verification process that didn't take the verification use case into account, since they implicitly assumed that having fewer than k active readers was an error. This is true in the case of downloading a file, since the Retrieve should be aborted with an error before proceeding down this code path in that case, but not for a verification, which can sensibly continue with fewer than k shares. These commits test that this case is handled correctly, remove the assertions so that verification operations with fewer than k shares don't fail due to an assertion error, and add code to Retrieve._download_current_segment to ensure that the verification operation is aborted if all of the shares are corrupt.

I'm setting 'review-needed'. I guess if we're using GitHub, that means you should go look at the pull request. I can attach an equivalent patch to this ticket if anyone wants me to do that.

Last edited at 2012-03-05T22:32:12Z by zooko (previous) (diff)

comment:6 Changed at 2012-02-18T17:39:43Z by kevan

The pull request was landed a while ago. If the error is reliably reproducible, we should try to reproduce it with the patch, then close this ticket if the original issue appears to be fixed.

comment:7 Changed at 2012-03-13T05:51:11Z by nejucomo

  • Keywords review-needed removed
  • Resolution set to fixed
  • Status changed from new to closed

I just read these comments and the comments on https://github.com/warner/tahoe-lafs/pull/3 and it appears as if this is already merged with unit tests.

I also read over the process here: https://tahoe-lafs.org/trac/tahoe-lafs/wiki/PatchReviewProcess

I'm going to remove the "review-needed" tag because the pull request is merged and therefore I assume the patch was reviewed and the tests in git 486c438 "Add test_verify_mdmf_all_bad_sharedata" are correct and pass.

Note: See TracTickets for help on using tickets.