[tahoe-dev] behavior of an immutable file repairer

Brian Warner warner at lothar.com
Mon Oct 27 08:23:40 PDT 2008


Looks good to me. You might add a note about what happens when  
verify=True and corrupt shares are detected (specifically that we  
can't automatically delete the share, so it is announced in the  
results, so someone can manually investigate). Also, it sounds like a  
server with a corrupt share is excluded from receiving any new shares  
(they become a distrusted server), which is a reasonable policy, but  
could be made slightly more explicit in the docstring. Also also, you  
might note the bandwidth costs of the different modes: verify=False is  
"dumb"/"trusting", but I expect we would run it more often than  
verify=True to save an N/k*filesize download cost.

Also, we should consider adding a timeout argument to these methods,  
to deal with servers who lock up and never respond to a query. We've  
had this happen in the production grid before. The Foolscap timers may  
be enough, but their values must be set with care, especially if the  
client is behind a slow connection, to avoid false timeouts, so a more  
direct timeout policy might be better.

Incidentally, I added code the other night to give downloading/ 
verifying clients a channel to advise the storage servers about  
corruption detected in their shares (a new method in the  
RIStorageServer protocol) so we don't have to rely upon somebody  
acting upon the CheckAndRepairResults information.
I added calls in immutable download.. feel free to use it in immutable  
verifier too.

cheers,
  -Brian

On Oct 26, 2008, at 7:56 PM, zooko <zooko at zooko.com> wrote:

> Folks:
>
> I was on an airplane today headed for ACM CCS 2008, and I did some
> work on immutable file
> repairer.  Here is the current docstring and constructor signature.
> Comments welcome!
>
> Regards,
>
> Zooko
>
> class ImmutableFileRepairer(object):
>     """ I have two phases -- check phase and repair phase.  In the
>     first phase -- check phase, I query servers (in
>     permuted-by-storage-index order) until I am satisfied that all M
>     uniquely-numbered shares are available (or I run out of servers).
>
>     If the verify flag was passed to my constructor, then for each
>     share I download every data block and all metadata from each
>     server and perform a cryptographic integrity check on all of it.
>     If not, I just ask each server "Which shares do you have?" and
>     believe its answer.
>
>     In either case, I wait until I have either gotten satisfactory
>     information about all M uniquely-numbered shares, or have run out
>     of servers to ask.  (This fact -- that I wait -- means that an
>     ill-behaved server which fails to answer my questions will make me
>     wait indefinitely.  If it is ill-behaved in a way that triggers
>     the underlying foolscap timeouts, then I will wait only as long as
>     those foolscap timeouts, but if it is ill-behaved in a way which
>     placates the foolscap timeouts but still doesn't answer my
>     question then I will wait indefinitely.)
>
>     Then, if I was not satisfied that all M of the shares are
>     available from at least one server, and if the repair flag was
>     passed to my constructor, I enter the repair phase.  In the repair
>     phase, I generate any shares which were not available and upload
>     them to servers.
>
>     Which servers?  Well, I take the list of servers and if I was in
>     verify mode during the check phase then I exclude any servers
>     which claimed to have a share but then failed to serve it up, or
>     served up a corrupted one, when I asked for it.  (If I was not in
>     verify mode, then I don't exclude any servers, not even servers
>     which, when I subsequently attempt to download the file during
>     repair, claim to have a share but then fail to produce it, or
>     produce a corrupted share, because when I am not in verify mode
>     then I am dumb.)  Then I perform the normal server-selection
>     process of permuting the order of the servers by the storage
>     index, and choosing the next server which doesn't already have
>     more shares than others.
>
>     My process of uploading replacement shares proceeds in a
>     segment-wise fashion -- first I ask servers if they can hold the
>     new shares, and once enough have agreed then I download the first
>     segment of the file and upload the first block of each replacement
>     share, and only after all those blocks have been uploaded do I
>     download the second segment of the file and upload the second
>     block of each replacement share to its respective server.  (I do
>     it this way in order to minimize the amount of downloading I have
>     to do and the amount of memory I have to use at any one time.)
>
>     If any of the servers to which I am uploading replacement shares
>     fails to accept the blocks during this process, then I just stop
>     using that server, abandon any share-uploads that were going to
>     that server, and proceed to finish uploading the remaining shares
>     to their respective servers.  At the end of my work, I produce an
>     object which satisfies the ICheckAndRepairResults interface (by
>     firing the deferred that I returned from start() and passing that
>     check-and-repair-results object).
>
>     Along the way, before I send another request on the network I
>     always ask the "monitor" object that was passed into my
>     constructor whether this task has been cancelled (by invoking its
>     raise_if_cancelled() method).
>     """
>     def __init__(self, client, verifycap, servers, verify, repair,
> monitor):
>         assert precondition(isinstance(verifycap, CHKFileVerifierURI))
>         assert precondition(isinstance(servers, set))
>         for (serverid, serverrref) in servers:
>             assert precondition(isinstance(serverid, str))
>
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at allmydata.org
> http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev


More information about the tahoe-dev mailing list