[tahoe-lafs-trac-stream] [Tahoe-LAFS] #755: Allow deep-check to continue after error, and: if there is an unrecoverable subdirectory, the deep-check report (both WUI and CLI) loses other information
Tahoe-LAFS
trac at tahoe-lafs.org
Tue Aug 21 21:48:05 UTC 2018
#755: Allow deep-check to continue after error, and: if there is an unrecoverable
subdirectory, the deep-check report (both WUI and CLI) loses other
information
-------------------------+-------------------------------------------------
Reporter: zooko | Owner: daira
Type: defect | Status: new
Priority: | Milestone: soon
critical |
Component: code- | Version: 1.4.1
dirnodes | Keywords: usability error tahoe-check wui
Resolution: | verify repair
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by tlhonmey):
Ok, so tahoe manifest also gives up on the first error it encounters, it
just only encounters errors on damaged directories. But it will still
bite you hard if you are actually stupid enough to rely on it.
So I've resorted to the following bash script:
{{{
#! /bin/bash
tahoe="/home/tahoe/tahoe/bin/tahoe"
THREADS=5
FAILEDLOG="/tmp/failed.txt"
recurser() {
CHILDREN=""
echo "checking directory: $1"
$tahoe check --add-lease "$1" || $tahoe check --add-lease --repair "$1"
|| sleep 5m #give it 5 minutes before continuing to let the grid come back
up if this is a connection failure. This prevents the entire script from
finishing as failures if the network connection goes down.
local ITEM
for ITEM in $($tahoe ls -F "$1"); do
echo "checking: ${1}${ITEM}"
echo "$ITEM" | grep "/" >> /dev/null && echo " Is a directory..." &&
recurser "${1}${ITEM}"
( $tahoe check --add-lease "${1}${ITEM}" || $tahoe check --repair
--add-lease "${1}${ITEM}" || echo "${1}${ITEM}" >> $FAILEDLOG ) &
CHILDREN="$? $CHILDREN"
if [[ $(echo "$CHILDREN" | wc -w) == "$THREADS" ]]; then
wait
CHILDREN=""
fi
done
}
echo "If it blows up immediately when passed a URI make sure you end it
with a /"
recurser "$1"
}}}
The careful observer will notice that this script calls "check --add-
lease" first and then only calls --repair if that returns an error. This
is due to another bug in the --repair functionality which I will be filing
shortly.
Is making deep-check note the unrepairable nodes, but then continue to
check the rest of the tree really that difficult? I wouldn't think the
average user should have to resort to writing their own tools to avoid
cascade failures of the storage system...
If you guys want to bundle this tool or some clone or variant thereof into
your packages you are more than welcome to do so. We need something to
actually keep people's data safe until this bug is fixed.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/755#comment:39>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list