Opened at 2012-07-06T23:32:42Z
Last modified at 2016-08-05T22:02:46Z
#1791 new defect
UploadUnhappinessError with available storage nodes > shares.happy — at Initial Version
Reported by: | gyver | Owned by: | davidsarah |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code-peerselection | Version: | 1.9.2 |
Keywords: | servers-of-happiness upload error | Cc: | zooko, vladimir@… |
Launchpad Bug: |
Description
The error happened with 1.9.1 too. I just upgraded to 1.9.2 and fixed some files/dir that 1.9.1 couldn't repair reliably hoping the following problem would get away too (it didn't).
There are some peculiarities in my setup: I use USB disks connected to a single server so all storage nodes are running on the same server although physically on a disk that can easily be sent away for increasing the durability of the whole storage. At the time of failure there were 7 such storage nodes in my setup and my whole store was fully repaired on these 7 nodes, all the content is/was uploaded with shares.needed = 4 shares.happy = 6 shares.total = 6
Although 7 >= 6 I get this error when trying to tahoe cp a new file:
Traceback (most recent call last): File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 677, in _done self.request.complete(res) File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 60, in complete self.deferred.callback(res) File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\", line 361, in callback self._startRunCallbacks(result) File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\", line 455, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/usr/lib64/python2.7/site-packages/allmydata/immutable/upload.py\", line 553, in _got_response return self._loop() File \"/usr/lib64/python2.7/site-packages/allmydata/immutable/upload.py\", line 404, in _loop return self._failed(\"%s (%s)\" % (failmsg, self._get_progress_message())) File \"/usr/lib64/python2.7/site-packages/allmydata/immutable/upload.py\", line 566, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: shares could be placed on only 5 server(s) such that any 4 of them have enough shares to recover the file, but we were asked to place shares on at least 6 such servers. (placed all 6 shares, want to place shares on at least 6 servers such that any 4 of them have enough shares to recover the file, sent 5 queries to 5 servers, 5 queries placed some shares, 0 placed none (of which 0 placed none due to the server being full and 0 placed none due to an error))
I recently found out about flogtool, so I run it on the client node (which is one of the 7 storage nodes btw), I only pasted the last part from CHKUploader (I can attach the whole log if needs be):
01:04:01.314 L20 []#2339 CHKUploader starting 01:04:01.314 L20 []#2340 starting upload of <allmydata.immutable.upload.EncryptAnUploadable instance at 0x2c9b5a8> 01:04:01.314 L20 []#2341 creating Encoder <Encoder for unknown storage index> 01:04:01.314 L20 []#2342 file size: 4669394 01:04:01.314 L10 []#2343 my encoding parameters: (4, 6, 6, 131072) 01:04:01.314 L20 []#2344 got encoding parameters: 4/6/6 131072 01:04:01.314 L20 []#2345 now setting up codec 01:04:01.348 L20 []#2346 using storage index k5ga2 01:04:01.348 L20 []#2347 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): starting 01:04:01.363 L10 []#2348 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server zp6jpfeu: alreadygot=(0,), allocated=() 01:04:01.372 L10 []#2349 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server pa2myijh: alreadygot=(2,), allocated=(1,) 01:04:01.375 L20 []#2350 storage: allocate_buckets k5ga2suaoz7gju523f5ni3mswe 01:04:01.377 L10 []#2351 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server omkzwfx5: alreadygot=(3,), allocated=() 01:04:01.381 L10 []#2352 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server wo6akhxt: alreadygot=(4,), allocated=() 01:04:01.404 L10 []#2353 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server ughwvrtu: alreadygot=(), allocated=(5,) 01:04:01.405 L25 []#2354 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): server selection unsuccessful for <Tahoe2ServerSelector for upload k5ga2>: shares could be placed on only 5 server(s) such that any 4 of them have enough shares to recover the file, but we were asked to place shares on at least 6 such servers. (placed all 6 shares, want to place shares on at least 6 servers such that any 4 of them have enough shares to recover the file, sent 5 queries to 5 servers, 5 queries placed some shares, 0 placed none (of which 0 placed none due to the server being full and 0 placed none due to an error)), merged=sh0: zp6jpfeu, sh1: pa2myijh, sh2: pa2myijh, sh3: omkzwfx5, sh4: wo6akhxt, sh5: ughwvrtu 01:04:01.407 L20 []#2355 web: 127.0.0.1 PUT /uri/[CENSORED].. 500 1644