[tahoe-lafs-trac-stream] [tahoe-lafs] #1791: UploadUnhappinessError with available storage nodes > shares.happy

Fri Jul 6 23:32:43 UTC 2012

#1791: UploadUnhappinessError with available storage nodes > shares.happy
---------------------+----------------------------
 Reporter:  gyver    |          Owner:  davidsarah
     Type:  defect   |         Status:  new
 Priority:  major    |      Milestone:  undecided
Component:  unknown  |        Version:  1.9.2
 Keywords:  happy    |  Launchpad Bug:
---------------------+----------------------------
 The error happened with 1.9.1 too. I just upgraded to 1.9.2 and fixed some
 files/dir that 1.9.1 couldn't repair reliably hoping the following problem
 would get away too (it didn't).

 There are some peculiarities in my setup: I use USB disks connected to a
 single server so all storage nodes are running on the same server although
 physically on a disk that can easily be sent away for increasing the
 durability of the whole storage. At the time of failure there were 7 such
 storage nodes in my setup and my whole store was fully repaired on these 7
 nodes, all the content is/was uploaded with
 shares.needed = 4
 shares.happy = 6
 shares.total = 6

 Although 7 >= 6 I get this error when trying to tahoe cp a new file:

 {{{
 Traceback (most recent call last):
   File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 677,
 in _done
     self.request.complete(res)
   File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 60,
 in complete
     self.deferred.callback(res)
   File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
 line 361, in callback
     self._startRunCallbacks(result)
   File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
 line 455, in _startRunCallbacks
     self._runCallbacks()
 --- <exception caught here> ---
   File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
 line 542, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File \"/usr/lib64/python2.7/site-
 packages/allmydata/immutable/upload.py\", line 553, in _got_response
     return self._loop()
   File \"/usr/lib64/python2.7/site-
 packages/allmydata/immutable/upload.py\", line 404, in _loop
     return self._failed(\"%s (%s)\" % (failmsg,
 self._get_progress_message()))
   File \"/usr/lib64/python2.7/site-
 packages/allmydata/immutable/upload.py\", line 566, in _failed
     raise UploadUnhappinessError(msg)
 allmydata.interfaces.UploadUnhappinessError: shares could be placed on
 only 5 server(s) such that any 4 of them have enough shares to recover the
 file, but we were asked to place shares on at least 6 such servers.
 (placed all 6 shares, want to place shares on at least 6 servers such that
 any 4 of them have enough shares to recover the file, sent 5 queries to 5
 servers, 5 queries placed some shares, 0 placed none (of which 0 placed
 none due to the server being full and 0 placed none due to an error))
 }}}

 I recently found out about flogtool, so I run it on the client node (which
 is one of the 7 storage nodes btw), I only pasted the last part from
 CHKUploader (I can attach the whole log if needs be):
 {{{
 01:04:01.314 L20 []#2339 CHKUploader starting
 01:04:01.314 L20 []#2340 starting upload of
 <allmydata.immutable.upload.EncryptAnUploadable instance at 0x2c9b5a8>
 01:04:01.314 L20 []#2341 creating Encoder <Encoder for unknown storage
 index>
 01:04:01.314 L20 []#2342 file size: 4669394
 01:04:01.314 L10 []#2343 my encoding parameters: (4, 6, 6, 131072)
 01:04:01.314 L20 []#2344 got encoding parameters: 4/6/6 131072
 01:04:01.314 L20 []#2345 now setting up codec
 01:04:01.348 L20 []#2346 using storage index k5ga2
 01:04:01.348 L20 []#2347 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
 starting
 01:04:01.363 L10 []#2348 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
 response to allocate_buckets() from server zp6jpfeu: alreadygot=(0,),
 allocated=()
 01:04:01.372 L10 []#2349 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
 response to allocate_buckets() from server pa2myijh: alreadygot=(2,),
 allocated=(1,)
 01:04:01.375 L20 []#2350 storage: allocate_buckets
 k5ga2suaoz7gju523f5ni3mswe
 01:04:01.377 L10 []#2351 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
 response to allocate_buckets() from server omkzwfx5: alreadygot=(3,),
 allocated=()
 01:04:01.381 L10 []#2352 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
 response to allocate_buckets() from server wo6akhxt: alreadygot=(4,),
 allocated=()
 01:04:01.404 L10 []#2353 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
 response to allocate_buckets() from server ughwvrtu: alreadygot=(),
 allocated=(5,)
 01:04:01.405 L25 []#2354 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
 server selection unsuccessful for <Tahoe2ServerSelector for upload k5ga2>:
 shares could be placed on only 5 server(s) such that any 4 of them have
 enough shares to recover the file, but we were asked to place shares on at
 least 6 such servers. (placed all 6 shares, want to place shares on at
 least 6 servers such that any 4 of them have enough shares to recover the
 file, sent 5 queries to 5 servers, 5 queries placed some shares, 0 placed
 none (of which 0 placed none due to the server being full and 0 placed
 none due to an error)), merged=sh0: zp6jpfeu, sh1: pa2myijh, sh2:
 pa2myijh, sh3: omkzwfx5, sh4: wo6akhxt, sh5: ughwvrtu
 01:04:01.407 L20 []#2355 web: 127.0.0.1 PUT /uri/[CENSORED].. 500 1644
 }}}

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1791>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage