#686 assigned defect

Search for lost share resulted in a directory popping up at unexpected place — at Initial Version

Reported by: [4-tea-2] Owned by: nobody
Priority: major Milestone: soon
Component: code-frontend-web Version: 1.4.1
Keywords: integrity error Cc:
Launchpad Bug:

Description

I'm currently running a private test grid which, over the last few weeks, grew to 20 nodes. As test data, I'm using my audio folder, I backed it up in a few stages using "tahoe backup .../audio media:audio". The grid is running "3-of-5", since all of the nodes are pretty reliable and under my control.

A couple of days, I ran a "tahoe deep-check --add-lease media:" and got a summary indicating an unhealthy file. I ran a few more deep-checks until I found the affected file ("tahoe deep-check media:" did not give the file name, "tahoe deep-check -v media:" gave the filename but at that time I didn't see it because "grep -v Healthy" also matched the "Not Healthy" message ;) - finally running deep-check from the WUI gave me the filename and the storage index).

Local file: .../audio/untagged or incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)/Aim - Fabriclive 17.wav

Affected file in grid: media:audio/Archives/2009-04-17_23:04:36Z/untagged or incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)/Aim - Fabriclive 17.wav

Message from "tahoe deep-check -v media:": audio/Archives/2009-04-17_23:04:36Z/untagged or incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)/Aim - Fabriclive 17.wav: Not Healthy: 4 shares (enc 3-of-5)

Checking the file from the WUI gave me the list of the available shares, 1-4. Share 0 was gone.

Since I wanted to find out why the share vanished, zooko recommended to search .flog files for the storage index. I found 35 incident reports, most of those I checked were caused by connectivity problems (e.g. introducer not reachable, because I opened the firewall on the introducer only after installing and starting the tahoe node), none of the .flog files contained the storage index of the unhealthy file.

The file <storage idx>/0 wasn't physically present in any of the storage/ folders on any of the nodes (/1, /2, /3, /4 were).

Well, it seems one of my nodes lost a share without good reason - could that happen when a node is restarted while a share is uploading?

But here's the real weird thing:

marc@bong:~$ tahoe ls -l media:audio drwx - Apr 13 00:02 Archives dr-x - Apr 13 00:05 Latest drwx - Apr 25 00:59 untagged or incomplete marc@bong:~$ tahoe manifest media:audio/"untagged or incomplete" URI:DIR2:... URI:DIR2:... Music URI:DIR2:... Music/AIM URI:DIR2:... Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)

For reasons which are a complete mystery to me, part of the directory structure of the file with the lost share appeared in the target folder of "tahoe backup .../audio media:audio".

Not the whole directory tree was duplicated, only the folders leading to the affected file. The directory Music/ contains many more files and directories. Sadly, some of the filenames contain UTF-8 diacritics, triggering a "UnicodeEncodeError?: 'ascii' codec can't encode character u'\xe4' in position 7: ordinal not in range(128)" when I try to "tahoe ls" the directory. I can access the files from the WUI, though.

I did not try to repair the unhealthy file yet, I didn't want to spoil the chance to find the original problem.

I can supply additional info (incident reports etc.) if needed.

Change History (1)

Changed at 2009-04-26T14:14:15Z by [4-tea-2]

Note: See TracTickets for help on using tickets.