Opened at 2012-01-13T07:22:36Z
Closed at 2012-03-12T20:02:49Z
#1655 closed defect (duplicate)
Reproducible UncoordinatedWriteError on repair
Reported by: | ianchov | Owned by: | somebody |
---|---|---|---|
Priority: | critical | Milestone: | 1.9.2 |
Component: | code | Version: | 1.9.0 |
Keywords: | ucwe repair regression | Cc: | |
Launchpad Bug: |
Description (last modified by zooko)
Hi
Tahoe 1.9.1 (same with 1.9.0)
[ianchov@localhost]$ ./bin/tahoe deep-check --repair --add-lease -v XYZ:XYZ '<root>': not healthy repair successful ERROR: UncoordinatedWriteError() "[Failure instance: Traceback (failure with no frames): <class 'allmydata.mutable.common.UncoordinatedWriteError'>: "
Attachments (2)
Change History (18)
comment:1 Changed at 2012-02-16T17:05:36Z by davidsarah
- Keywords ucwe repair leases added
- Priority changed from critical to major
comment:2 Changed at 2012-02-16T17:42:35Z by ianchov
C:\Users\ianchov>C:\Python26\python.exe X:\allmydata-tahoe-1.9.1\bin\tahoe deep- check --repair -v -d X:\tahoe cveti: ERROR: UncoordinatedWriteError() "[Failure instance: Traceback (failure with no frames): <class 'allmydata.mutabl e.common.UncoordinatedWriteError'>: " if is without repair and add-lease C:\Users\ianchov>C:\Python26\python.exe X:\allmydata-tahoe-1.9.1\bin\tahoe deep- check -v -d X:\tahoe XXXX: '<root>': Unhealthy: some versions are unrecoverable 'Archives': Unhealthy: some versions are unrecoverable 10 shares (enc 5-of-12) 'Archives/2012-02-01_08:25:59Z': Not Healthy: 10 shares (enc 5-of-12) 'Archives/2012-02-01_08:25:59Z/Local Disk (C) - Shortcut.lnk': Not Healthy: 10 s hares (enc 5-of-12).....
comment:3 Changed at 2012-02-17T00:09:30Z by davidsarah
- Keywords leases removed
comment:4 Changed at 2012-02-17T00:10:45Z by davidsarah
- Summary changed from Cannot deep-check url to Reproducible UncoordinatedWriteError on repair
comment:5 Changed at 2012-02-17T16:31:16Z by zooko
- Priority changed from major to critical
comment:6 Changed at 2012-02-18T17:47:35Z by kevan
I don't think 1.9.1 has the fix for #1628. Can you try a deep check + repair with a Tahoe-LAFS that has that fix applied (preferably the current git master) and let us know if you can still reproduce the error?
comment:7 Changed at 2012-02-18T23:04:03Z by gyver
Same problem here : "tahoe deep-check --add-lease" works on an alias but "tahoe deep-check --repair" throws UncoordinatedWriteError. See ticket #1656 for a probably related bug.
I separated the two to at least maintain my backups on the storage network waiting for a solution.
Unfortunately, I have to deploy the solution to 8 servers and for this I use gentoo ebuilds so testing git master is a bit tricky (although possible if time allows).
One note : one of my storage node had network connection problems which most probably happened during "tahoe cp" : I put very large tar.xz files that can take more than one hour to store. The problem started about the same time these connection issues happened.
comment:8 Changed at 2012-03-03T19:23:44Z by ianchov
.....'Archives/2012-02-01_08:25:59Z/DESKTOP/Dokumentatsia_en.efektivnost_Kostinbrod/P rilojenie_3_deklaracia 47,1.doc': not healthy repair successful "ERROR: AttributeError('NoneType' object has no attribute 'callRemote')" "[Failure instance: Traceback: <type 'exceptions.AttributeError'>: 'NoneType' ob ject has no attribute 'callRemote'" X:\allmydata-tahoe-1.9.1\support\Lib\site-packages\foolscap-0.6.3-py2.6.egg\fool scap\call.py:677:_done X:\allmydata-tahoe-1.9.1\support\Lib\site-packages\foolscap-0.6.3-py2.6.egg\fool scap\call.py:60:complete X:\allmydata-tahoe-1.9.1\support\Lib\site-packages\twisted-10.1.0-py2.6-win-amd6 4.egg\twisted\internet\defer.py:318:callback X:\allmydata-tahoe-1.9.1\support\Lib\site-packages\twisted-10.1.0-py2.6-win-amd6 4.egg\twisted\internet\defer.py:424:_startRunCallbacks --- <exception caught here> --- X:\allmydata-tahoe-1.9.1\support\Lib\site-packages\twisted-10.1.0-py2.6-win-amd6 4.egg\twisted\internet\defer.py:441:_runCallbacks x:\allmydata-tahoe-1.9.1\src\allmydata\immutable\upload.py:553:_got_response x:\allmydata-tahoe-1.9.1\src\allmydata\immutable\upload.py:420:_loop x:\allmydata-tahoe-1.9.1\src\allmydata\immutable\upload.py:105:query C:\Users\ianchov>C:\Python26\python.exe X:\allmydata-tahoe-1.9.1\bin\tahoe deep- check --repair --add-lease -v -d X:\tahoe cveti: '<root>': not healthy repair successful ERROR: UncoordinatedWriteError() "[Failure instance: Traceback (failure with no frames): <class 'allmydata.mutabl e.common.UncoordinatedWriteError'>: " C:\Users\ianchov>C:\Python26\python.exe X:\allmydata-tahoe-1.9.1\bin\tahoe deep- check --repair --add-lease -v -d X:\tahoe cveti: '<root>': not healthy repair successful ERROR: UncoordinatedWriteError() "[Failure instance: Traceback (failure with no frames): <class 'allmydata.mutabl e.common.UncoordinatedWriteError'>: "
comment:9 Changed at 2012-03-05T22:16:04Z by zooko
- Description modified (diff)
comment:10 Changed at 2012-03-05T22:17:28Z by zooko
- Keywords regression added
- Milestone changed from undecided to 1.9.2
I'm adding the keyword regression on the assumption that this is related to the regressions in 1.9.x, and I'm adding it to the 1.9.2 Milestone. Please let me know if you know that assumption is incorrect.
comment:11 Changed at 2012-03-06T06:35:56Z by ianchov
I am not sure what should i do..
Please give me hint howto debug.
comment:12 Changed at 2012-03-10T18:05:50Z by kevan
ianchov: Are those incidents from a Tahoe-LAFS with the fix for #1628 applied? At first inspection, the UncoordinatedWriteErrors? in those files look like issue #1628.
comment:13 Changed at 2012-03-12T17:14:17Z by ianchov
Confirmed!
I thinkg it is the same as #1628 The last git e27423e4a9 does not have this problem.
comment:14 Changed at 2012-03-12T17:14:32Z by ianchov
- Resolution set to fixed
- Status changed from new to closed
comment:15 Changed at 2012-03-12T20:02:41Z by davidsarah
- Resolution fixed deleted
- Status changed from closed to reopened
comment:16 Changed at 2012-03-12T20:02:49Z by davidsarah
- Resolution set to duplicate
- Status changed from reopened to closed
Is this problem reproducible, and does it happen without --add-lease?