[tahoe-lafs-trac-stream] [tahoe-lafs] #686: Search for lost share resulted in a directory popping up at unexpected place

tahoe-lafs trac at tahoe-lafs.org
Tue May 14 22:18:43 UTC 2013


#686: Search for lost share resulted in a directory popping up at unexpected
place
-----------------------------------+-----------------------------
     Reporter:  [4-tea-2]          |      Owner:  daira
         Type:  defect             |     Status:  assigned
     Priority:  major              |  Milestone:  1.11.0
    Component:  code-frontend-web  |    Version:  1.4.1
   Resolution:                     |   Keywords:  integrity error
Launchpad Bug:                     |
-----------------------------------+-----------------------------
Changes (by daira):

 * owner:  nobody => daira
 * status:  new => assigned
 * milestone:  soon => 1.11.0


Old description:

> I'm currently running a private test grid which, over the last few weeks,
> grew to 20 nodes. As test data, I'm using my audio folder, I backed it up
> in a few stages using "tahoe backup .../audio media:audio". The grid is
> running "3-of-5", since all of the nodes are pretty reliable and under my
> control.
>
> A couple of days, I ran a "tahoe deep-check --add-lease media:" and got a
> summary indicating an unhealthy file. I ran a few more deep-checks until
> I found the affected file ("tahoe deep-check media:" did not give the
> file name, "tahoe deep-check -v media:" gave the filename but at that
> time I didn't see it because "grep -v Healthy" also matched the "Not
> Healthy" message ;) - finally running deep-check from the WUI gave me the
> filename and the storage index).
>
> Local file:
> .../audio/untagged or incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC -
> CUE - EAC)/Aim - Fabriclive 17.wav
>
> Affected file in grid:
> media:audio/Archives/2009-04-17_23:04:36Z/untagged or
> incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)/Aim -
> Fabriclive 17.wav
>
> Message from "tahoe deep-check -v media:":
> audio/Archives/2009-04-17_23:04:36Z/untagged or incomplete/Music/AIM/Aim
> - Fabriclive 17 (FLAC - CUE - EAC)/Aim - Fabriclive 17.wav: Not Healthy:
> 4 shares (enc 3-of-5)
>
> Checking the file from the WUI gave me the list of the available shares,
> 1-4. Share 0 was gone.
>
> Since I wanted to find out why the share vanished, zooko recommended to
> search .flog files for the storage index. I found 35 incident reports,
> most of those I checked were caused by connectivity problems (e.g.
> introducer not reachable, because I opened the firewall on the introducer
> only after installing and starting the tahoe node), none of the .flog
> files contained the storage index of the unhealthy file.
>
> The file <storage idx>/0 wasn't physically present in any of the storage/
> folders on any of the nodes (/1, /2, /3, /4 were).
>
> Well, it seems one of my nodes lost a share without good reason - could
> that happen when a node is restarted while a share is uploading?
>
> But here's the real weird thing:
>
> marc at bong:~$ tahoe ls -l media:audio
> drwx - Apr 13 00:02               Archives
> dr-x - Apr 13 00:05                 Latest
> drwx - Apr 25 00:59 untagged or incomplete
> marc at bong:~$ tahoe manifest media:audio/"untagged or incomplete"
> URI:DIR2:...
> URI:DIR2:... Music
> URI:DIR2:... Music/AIM
> URI:DIR2:... Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)
>
> For reasons which are a complete mystery to me, part of the directory
> structure of the file with the lost share appeared in the target folder
> of "tahoe backup .../audio media:audio".
>
> Not the whole directory tree was duplicated, only the folders leading to
> the affected file. The directory Music/ contains many more files and
> directories. Sadly, some of the filenames contain UTF-8 diacritics,
> triggering a "UnicodeEncodeError: 'ascii' codec can't encode character
> u'\xe4' in position 7: ordinal not in range(128)" when I try to "tahoe
> ls" the directory. I can access the files from the WUI, though.
>
> I did not try to repair the unhealthy file yet, I didn't want to spoil
> the chance to find the original problem.
>
> I can supply additional info (incident reports etc.) if needed.

New description:

 I'm currently running a private test grid which, over the last few weeks,
 grew to 20 nodes. As test data, I'm using my audio folder, I backed it up
 in a few stages using "tahoe backup .../audio media:audio". The grid is
 running "3-of-5", since all of the nodes are pretty reliable and under my
 control.

 A couple of days, I ran a "tahoe deep-check --add-lease media:" and got a
 summary indicating an unhealthy file. I ran a few more deep-checks until I
 found the affected file ("tahoe deep-check media:" did not give the file
 name, "tahoe deep-check -v media:" gave the filename but at that time I
 didn't see it because "grep -v Healthy" also matched the "Not Healthy"
 message ;) - finally running deep-check from the WUI gave me the filename
 and the storage index).

 Local file:
 .../audio/untagged or incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE
 - EAC)/Aim - Fabriclive 17.wav

 Affected file in grid:
 media:audio/Archives/2009-04-17_23:04:36Z/untagged or
 incomplete/Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)/Aim -
 Fabriclive 17.wav

 Message from "tahoe deep-check -v media:":
 audio/Archives/2009-04-17_23:04:36Z/untagged or incomplete/Music/AIM/Aim -
 Fabriclive 17 (FLAC - CUE - EAC)/Aim - Fabriclive 17.wav: Not Healthy: 4
 shares (enc 3-of-5)

 Checking the file from the WUI gave me the list of the available shares,
 1-4. Share 0 was gone.

 Since I wanted to find out why the share vanished, zooko recommended to
 search .flog files for the storage index. I found 35 incident reports,
 most of those I checked were caused by connectivity problems (e.g.
 introducer not reachable, because I opened the firewall on the introducer
 only after installing and starting the tahoe node), none of the .flog
 files contained the storage index of the unhealthy file.

 The file <storage idx>/0 wasn't physically present in any of the storage/
 folders on any of the nodes (/1, /2, /3, /4 were).

 Well, it seems one of my nodes lost a share without good reason - could
 that happen when a node is restarted while a share is uploading?

 But here's the real weird thing:

 marc at bong:~$ tahoe ls -l media:audio
 drwx - Apr 13 00:02               Archives
 dr-x - Apr 13 00:05                 Latest
 drwx - Apr 25 00:59 untagged or incomplete
 marc at bong:~$ tahoe manifest media:audio/"untagged or incomplete"
 URI:DIR2:...
 URI:DIR2:... Music
 URI:DIR2:... Music/AIM
 URI:DIR2:... Music/AIM/Aim - Fabriclive 17 (FLAC - CUE - EAC)

 For reasons which are a complete mystery to me, part of the directory
 structure of the file with the lost share appeared in the target folder of
 "tahoe backup .../audio media:audio".

 Not the whole directory tree was duplicated, only the folders leading to
 the affected file. The directory Music/ contains many more files and
 directories. Sadly, some of the filenames contain UTF-8 diacritics,
 triggering a "UnicodeEncodeError: 'ascii' codec can't encode character
 u'\xe4' in position 7: ordinal not in range(128)" when I try to "tahoe ls"
 the directory. I can access the files from the WUI, though.

 I did not try to repair the unhealthy file yet, I didn't want to spoil the
 chance to find the original problem.

 I can supply additional info (incident reports etc.) if needed.

--

Comment:

 Ugh, this is a horrible bug that should have had more attention; how did
 this slip through the cracks?

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/686#comment:8>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list