[volunteergrid2-l] some error

Fri Dec 16 20:39:06 UTC 2011

what version of tahoe-lafs are you using?   I received a similar deep-check
error after I moved to 1.9.0 , and decided to go back to 1.8.3.  I'm not
sure that will solve your problem, but you might want to try that if you
are using 1.9.0.

On Fri, Dec 16, 2011 at 2:21 PM, Christoph Langguth <
christoph at rosenkeller.org> wrote:

> Am 16.12.2011 20:43, schrieb Iantcho Vassilev:
>
>  Hi guys,
>>
>>
>> I get strange error:
>>
>> [ianchov at localhost ~]$ ./allmydata-tahoe-1.9.0/bin/**tahoe deep-check
>> --repair --add-lease -v ianchov2:cveti
>> '<root>': not healthy
>>  repair successful
>> ERROR: UncoordinatedWriteError()
>> "[Failure instance: Traceback (failure with no frames): <class
>> 'allmydata.mutable.common.**UncoordinatedWriteError'>: "
>>
>>
>>  Hi Iantcho,
>
> no ideas unfortunately, just a "me too".
>
> I have also seen this happening with mutable directories. I haven't found
> the reason for it yet, so I'm just guessing.
>
> The name (UncoordinatedWriteError) seems to indicate an inconsistency at
> the storage layer. The most prominent example would be two programs
> simultaneously writing to a directory (uploading conflicting information
> about the directory's contents at the same time). However, in all cases
> that I have encountered this, it was definitely NOT caused by concurrent
> writes, because there was only one program accessing tahoe. So, problem 1,
> I also don't have a clue as to WHY this is happening.
>
> From the logs (.tahoe/logs/incidents)* it seems like the actual problem is
> some "surprise share" somewhere: tahoe-lafs encounters shares for the
> file/directory in question which it did not expect -- and then fails. In
> theory, this seems to validate the "concurrent write" hypothesis -- but
> again, in all occasions where I encountered there was no concurrency. The
> only thing that I can think of is some kind of intermittent network
> problem, which causes uploads to fail temporarily, and where subsequent
> retries could get in the way of normal operations.
>
> However, the biggest problem, which makes this really *nasty* is that
> there is no real solution to it. "tahoe check" won't work, and "tahoe
> deep-check" won't work either. The "--repair" option also does not help,
> because the error obviously occurs even before repair can be attempted.
> Essentially, a directory affected by this has become useless forever, and
> cannot be repaired.
>
> The only workaround that I have found so far is to completely dismiss the
> directory, "copying" (actually linking) its contents to a new directory,
> and then throwing the original away. Below is pseudocode without tahoe, but
> it translates to tahoe commands in a straightforward way:
>
> mkdir ianchov2:cveti2
> for i in `ls ianchov2:cveti`; do ln ianchov2:cveti/$i ianchov2:cveti2/;
> done
> unlink ianchov2:cveti
> mv ianchov2:cveti2 ianchov2:cveti
>
> This is still a far-from-optimal solution, but it is the only solution
> that I know at the moment (there is remarkably little, actually close to
> nothing, to be found on the internet). It still takes quite some time to
> link the directory entries (about 10 secs/entry the last time I had to use
> it), but at least you don't need to re-upload everything.
>
>
> HTH
> Chris
>
>
> PS:
> (*) I took a while to find this out, so maybe it is helpful: The incident
> files can be read using flogtool (Ubuntu: apt-get install python-foolscap)
> like so: flogtool dump incident-2011-12-09--15-30-**40Z-p3dg7ba.flog.bz2
> |less
>
>
> _______________________________________________
> volunteergrid2-l mailing list
> volunteergrid2-l at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/volunteergrid2-l
> http://bigpig.org/twiki/bin/view/Main/WebHome
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20111216/f942098c/attachment.html>