[tahoe-dev] [tahoe-lafs] #755: if there is an unrecoverable subdirectory, the deep-check report (both WUI and CLI) loses other information
tahoe-lafs
trac at tahoe-lafs.org
Fri Jan 7 05:31:20 UTC 2011
#755: if there is an unrecoverable subdirectory, the deep-check report (both WUI
and CLI) loses other information
-------------------------------+--------------------------------------------
Reporter: zooko | Owner: francois
Type: defect | Status: new
Priority: critical | Milestone: 1.8.2
Component: code-dirnodes | Version: 1.4.1
Resolution: | Keywords: usability error tahoe-check wui verify repair
Launchpad Bug: |
-------------------------------+--------------------------------------------
Changes (by warner):
* keywords: usability error tahoe-check wui verify repair review-needed
=> usability error tahoe-check wui verify
repair
* owner: davidsarah => francois
* status: assigned => new
Comment:
Good patch! I like the approach of making filenode.check_and_repair()
signal inability to repair by returning
{{{CheckAndRepairResults.repair_successful}}}=False instead of by
throwing an exception. A few things I'd like to see changed:
* we usually repair files that are unhealthy but recoverable. If repair
fails, the file should still be recoverable. The post-repair-results
are pessimistically being set to healthy=False recoverable=False
needs_rebalancing=False, when it's probably (and sometimes certainly)
more accurate to copy these values from the pre-repair-results. In
particular, we shouldn't scare users into thinking that repair
failures of "scratched" files (unhealthy but recoverable) indicate
unrecoverable files: this makes benign things like
{{{UnhappinessError}}} look like data loss. This should be fixed in
both mutable and immutable files.
* the newly-enabled test in {{{test_repairer.Repairer.test_harness}}}
(which previously got a {{{self.shouldFail()}}}) should be slightly
enhanced to check the return value of {{{check_and_repair()}}}. We
should verify that it has {{{crr.repair_attempted=True}}},
{{{crr.repair_successful=False}}}, and
{{{crr.post_repair_results.recoverable=False}}}
* we should add a similar test for mutable files that have had 8 shares
deleted. There's something awfully close in
{{{test_mutable.Repair.test_unrepairable_1share}}} .. it should be
changed to use {{{self._fn.check_and_repair()}}} instead of
{{{self._fn.repair()}}} . To be honest, I'm not sure why that test was
passing before, because from what I can tell it should have been
behaving the same way as immutable repair on an unrecoverable file.
* it's probably worth checking the code coverage when we exercise
{{{test_mutable}}} and make sure the new code is getting run
* do we have any tests that confirm deep-repair on a tree with an
unrecoverable file (or directory) makes it through to the end without
an errback? We probably do but I'd like to be sure.. probably
something in {{{test_deepcheck}}} exercises this.
* I see {{{test_deepcheck.py:DeepCheckWebBad.do_deepcheck_broken()}}}
asserts that an unrecoverable dirnode causes the traversal to halt. Is
this what we want? Is this ticket about making sure an unrecoverable
*file* doesn't halt a deep-repair, or about an unrecoverable
*dirnode*? (broken dirnodes are more significant than files, because
it means you've probably lost access to even more data). We certainly
want the deep-traversal to keep going and repair more things, but we
also need to make sure the user learns about the dead dirnode.
Otherwise, looks great! With those few changes we can land this one for
1.8.2!
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/755#comment:22>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-dev
mailing list