#1791 new defect

UploadUnhappinessError with available storage nodes > shares.happy — at Initial Version

Reported by: gyver Owned by: davidsarah
Priority: major Milestone: soon
Component: code-peerselection Version: 1.9.2
Keywords: servers-of-happiness upload error Cc: zooko, vladimir@…
Launchpad Bug:

Description

The error happened with 1.9.1 too. I just upgraded to 1.9.2 and fixed some files/dir that 1.9.1 couldn't repair reliably hoping the following problem would get away too (it didn't).

There are some peculiarities in my setup: I use USB disks connected to a single server so all storage nodes are running on the same server although physically on a disk that can easily be sent away for increasing the durability of the whole storage. At the time of failure there were 7 such storage nodes in my setup and my whole store was fully repaired on these 7 nodes, all the content is/was uploaded with shares.needed = 4 shares.happy = 6 shares.total = 6

Although 7 >= 6 I get this error when trying to tahoe cp a new file:

Traceback (most recent call last):
  File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 677, in _done
    self.request.complete(res)
  File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 60, in complete
    self.deferred.callback(res)
  File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\", line 361, in callback
    self._startRunCallbacks(result)
  File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\", line 455, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/usr/lib64/python2.7/site-packages/allmydata/immutable/upload.py\", line 553, in _got_response
    return self._loop()
  File \"/usr/lib64/python2.7/site-packages/allmydata/immutable/upload.py\", line 404, in _loop
    return self._failed(\"%s (%s)\" % (failmsg, self._get_progress_message()))
  File \"/usr/lib64/python2.7/site-packages/allmydata/immutable/upload.py\", line 566, in _failed
    raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: shares could be placed on only 5 server(s) such that any 4 of them have enough shares to recover the file, but we were asked to place shares on at least 6 such servers. (placed all 6 shares, want to place shares on at least 6 servers such that any 4 of them have enough shares to recover the file, sent 5 queries to 5 servers, 5 queries placed some shares, 0 placed none (of which 0 placed none due to the server being full and 0 placed none due to an error))

I recently found out about flogtool, so I run it on the client node (which is one of the 7 storage nodes btw), I only pasted the last part from CHKUploader (I can attach the whole log if needs be):

01:04:01.314 L20 []#2339 CHKUploader starting
01:04:01.314 L20 []#2340 starting upload of <allmydata.immutable.upload.EncryptAnUploadable instance at 0x2c9b5a8>
01:04:01.314 L20 []#2341 creating Encoder <Encoder for unknown storage index>
01:04:01.314 L20 []#2342 file size: 4669394
01:04:01.314 L10 []#2343 my encoding parameters: (4, 6, 6, 131072)
01:04:01.314 L20 []#2344 got encoding parameters: 4/6/6 131072
01:04:01.314 L20 []#2345 now setting up codec
01:04:01.348 L20 []#2346 using storage index k5ga2
01:04:01.348 L20 []#2347 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): starting
01:04:01.363 L10 []#2348 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server zp6jpfeu: alreadygot=(0,), allocated=()
01:04:01.372 L10 []#2349 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server pa2myijh: alreadygot=(2,), allocated=(1,)
01:04:01.375 L20 []#2350 storage: allocate_buckets k5ga2suaoz7gju523f5ni3mswe
01:04:01.377 L10 []#2351 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server omkzwfx5: alreadygot=(3,), allocated=()
01:04:01.381 L10 []#2352 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server wo6akhxt: alreadygot=(4,), allocated=()
01:04:01.404 L10 []#2353 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): response to allocate_buckets() from server ughwvrtu: alreadygot=(), allocated=(5,)
01:04:01.405 L25 []#2354 <Tahoe2ServerSelector for upload k5ga2>(k5ga2): server selection unsuccessful for <Tahoe2ServerSelector for upload k5ga2>: shares could be placed on only 5 server(s) such that any 4 of them have enough shares to recover the file, but we were asked to place shares on at least 6 such servers. (placed all 6 shares, want to place shares on at least 6 servers such that any 4 of them have enough shares to recover the file, sent 5 queries to 5 servers, 5 queries placed some shares, 0 placed none (of which 0 placed none due to the server being full and 0 placed none due to an error)), merged=sh0: zp6jpfeu, sh1: pa2myijh, sh2: pa2myijh, sh3: omkzwfx5, sh4: wo6akhxt, sh5: ughwvrtu
01:04:01.407 L20 []#2355 web: 127.0.0.1 PUT /uri/[CENSORED].. 500 1644

Change History (0)

Note: See TracTickets for help on using tickets.