[tahoe-lafs-trac-stream] [Tahoe-LAFS] #755: Allow deep-check to continue after error, and: if there is an unrecoverable subdirectory, the deep-check report (both WUI and CLI) loses other information

Tahoe-LAFS trac at tahoe-lafs.org
Tue Aug 21 21:48:05 UTC 2018


#755: Allow deep-check to continue after error, and: if there is an unrecoverable
subdirectory, the deep-check report (both WUI and CLI) loses other
information
-------------------------+-------------------------------------------------
     Reporter:  zooko    |      Owner:  daira
         Type:  defect   |     Status:  new
     Priority:           |  Milestone:  soon
  critical               |
    Component:  code-    |    Version:  1.4.1
  dirnodes               |   Keywords:  usability error tahoe-check wui
   Resolution:           |  verify repair
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by tlhonmey):

 Ok, so tahoe manifest also gives up on the first error it encounters, it
 just only encounters errors on damaged directories.  But it will still
 bite you hard if you are actually stupid enough to rely on it.

 So I've resorted to the following bash script:

 {{{
 #! /bin/bash
 tahoe="/home/tahoe/tahoe/bin/tahoe"
 THREADS=5
 FAILEDLOG="/tmp/failed.txt"


 recurser() {
   CHILDREN=""
   echo "checking directory: $1"
   $tahoe check --add-lease "$1" || $tahoe check --add-lease --repair "$1"
 || sleep 5m #give it 5 minutes before continuing to let the grid come back
 up if this is a connection failure.  This prevents the entire script from
 finishing as failures if the network connection goes down.
   local ITEM
   for ITEM in $($tahoe ls -F "$1"); do
     echo "checking: ${1}${ITEM}"
     echo "$ITEM" | grep "/" >> /dev/null && echo "  Is a directory..." &&
 recurser "${1}${ITEM}"
     ( $tahoe check --add-lease "${1}${ITEM}" || $tahoe check --repair
 --add-lease "${1}${ITEM}" || echo "${1}${ITEM}" >> $FAILEDLOG ) &
     CHILDREN="$? $CHILDREN"
     if [[ $(echo "$CHILDREN" | wc -w) == "$THREADS" ]]; then
       wait
       CHILDREN=""
     fi
   done
 }


 echo "If it blows up immediately when passed a URI make sure you end it
 with a /"
 recurser "$1"

 }}}


 The careful observer will notice that this script calls "check --add-
 lease" first and then only calls --repair if that returns an error.  This
 is due to another bug in the --repair functionality which I will be filing
 shortly.



 Is making deep-check note the unrepairable nodes, but then continue to
 check the rest of the tree really that difficult?  I wouldn't think the
 average user should have to resort to writing their own tools to avoid
 cascade failures of the storage system...

 If you guys want to bundle this tool or some clone or variant thereof into
 your packages you are more than welcome to do so.  We need something to
 actually keep people's data safe until this bug is fixed.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/755#comment:39>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list