[tahoe-dev] [tahoe-lafs] #686: Search for lost share resulted in a directory popping up at unexpected place
tahoe-lafs
trac at allmydata.org
Sun Apr 26 06:36:12 PDT 2009
#686: Search for lost share resulted in a directory popping up at unexpected
place
-----------------------+----------------------------------------------------
Reporter: [4-tea-2] | Owner: nobody
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: unknown | Version: 1.4.1
Keywords: | Launchpad_bug:
-----------------------+----------------------------------------------------
I'm currently running a private test grid which, over the last few weeks,
grew to 20 nodes. As test data, I'm using my audio folder, I backed it up
in a few stages using "tahoe backup .../audio media:audio". The grid is
running "3-of-5", since all of the nodes are pretty reliable and under my
control.
A couple of days, I ran a "tahoe deep-check --add-lease media:" and got a
summary indicating an unhealthy file. I ran a few more deep-checks until I
found the affected file ("tahoe deep-check media:" did not give the file
name, "tahoe deep-check -v media:" gave the filename but at that time I
didn't see it because "grep -v Healthy" also matched the "Not Healthy"
message ;) - finally running deep-check from the WUI gave me the filename
and the storage index).
Local file:
.../audio/untagged or incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE
- EAC)/Aim - Fabriclive 17.wav
Affected file in grid:
media:audio/Archives/2009-04-17_23:04:36Z/untagged or
incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)/Aim -
Fabriclive 17.wav
Message from "tahoe deep-check -v media:":
audio/Archives/2009-04-17_23:04:36Z/untagged or incomplete/Music/AIM/Aim -
Fabriclive 17 (FLAC - CUE - EAC)/Aim - Fabriclive 17.wav: Not Healthy: 4
shares (enc 3-of-5)
Checking the file from the WUI gave me the list of the available shares,
1-4. Share 0 was gone.
Since I wanted to find out why the share vanished, zooko recommended to
search .flog files for the storage index. I found 35 incident reports,
most of those I checked were caused by connectivity problems (e.g.
introducer not reachable, because I opened the firewall on the introducer
only after installing and starting the tahoe node), none of the .flog
files contained the storage index of the unhealthy file.
The file <storage idx>/0 wasn't physically present in any of the storage/
folders on any of the nodes (/1, /2, /3, /4 were).
Well, it seems one of my nodes lost a share without good reason - could
that happen when a node is restarted while a share is uploading?
But here's the real weird thing:
marc at bong:~$ tahoe ls -l media:audio
drwx - Apr 13 00:02 Archives
dr-x - Apr 13 00:05 Latest
drwx - Apr 25 00:59 untagged or incomplete
marc at bong:~$ tahoe manifest media:audio/"untagged or incomplete"
URI:DIR2:...
URI:DIR2:... Music
URI:DIR2:... Music/AIM
URI:DIR2:... Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)
For reasons which are a complete mystery to me, part of the directory
structure of the file with the lost share appeared in the target folder of
"tahoe backup .../audio media:audio".
Not the whole directory tree was duplicated, only the folders leading to
the affected file. The directory Music/ contains many more files and
directories. Sadly, some of the filenames contain UTF-8 diacritics,
triggering a "UnicodeEncodeError: 'ascii' codec can't encode character
u'\xe4' in position 7: ordinal not in range(128)" when I try to "tahoe ls"
the directory. I can access the files from the WUI, though.
I did not try to repair the unhealthy file yet, I didn't want to spoil the
chance to find the original problem.
I can supply additional info (incident reports etc.) if needed.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/686>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list