[tahoe-lafs-trac-stream] [tahoe-lafs] #1791: UploadUnhappinessError with available storage nodes > shares.happy
tahoe-lafs
trac at tahoe-lafs.org
Tue Jul 9 14:32:38 UTC 2013
#1791: UploadUnhappinessError with available storage nodes > shares.happy
---------------------------+-----------------------------------------------
Reporter: gyver | Owner: gyver
Type: defect | Status: new
Priority: major | Milestone: 1.11.0
Component: code- | Version: 1.9.2
peerselection | Keywords: servers-of-happiness upload error
Resolution: |
Launchpad Bug: |
---------------------------+-----------------------------------------------
Old description:
> The error happened with 1.9.1 too. I just upgraded to 1.9.2 and fixed
> some files/dir that 1.9.1 couldn't repair reliably hoping the following
> problem would get away too (it didn't).
>
> There are some peculiarities in my setup: I use USB disks connected to a
> single server so all storage nodes are running on the same server
> although physically on a disk that can easily be sent away for increasing
> the durability of the whole storage. At the time of failure there were 7
> such storage nodes in my setup and my whole store was fully repaired on
> these 7 nodes, all the content is/was uploaded with
> shares.needed = 4
> shares.happy = 6
> shares.total = 6
>
> Although 7 >= 6 I get this error when trying to tahoe cp a new file:
>
> {{{
> Traceback (most recent call last):
> File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 677,
> in _done
> self.request.complete(res)
> File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 60,
> in complete
> self.deferred.callback(res)
> File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
> line 361, in callback
> self._startRunCallbacks(result)
> File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
> line 455, in _startRunCallbacks
> self._runCallbacks()
> --- <exception caught here> ---
> File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
> line 542, in _runCallbacks
> current.result = callback(current.result, *args, **kw)
> File \"/usr/lib64/python2.7/site-
> packages/allmydata/immutable/upload.py\", line 553, in _got_response
> return self._loop()
> File \"/usr/lib64/python2.7/site-
> packages/allmydata/immutable/upload.py\", line 404, in _loop
> return self._failed(\"%s (%s)\" % (failmsg,
> self._get_progress_message()))
> File \"/usr/lib64/python2.7/site-
> packages/allmydata/immutable/upload.py\", line 566, in _failed
> raise UploadUnhappinessError(msg)
> allmydata.interfaces.UploadUnhappinessError: shares could be placed on
> only 5 server(s) such that any 4 of them have enough shares to recover
> the file, but we were asked to place shares on at least 6 such servers.
> (placed all 6 shares, want to place shares on at least 6 servers such
> that any 4 of them have enough shares to recover the file, sent 5 queries
> to 5 servers, 5 queries placed some shares, 0 placed none (of which 0
> placed none due to the server being full and 0 placed none due to an
> error))
> }}}
>
> I recently found out about flogtool, so I run it on the client node
> (which is one of the 7 storage nodes btw), I only pasted the last part
> from CHKUploader (I can attach the whole log if needs be):
> {{{
> 01:04:01.314 L20 []#2339 CHKUploader starting
> 01:04:01.314 L20 []#2340 starting upload of
> <allmydata.immutable.upload.EncryptAnUploadable instance at 0x2c9b5a8>
> 01:04:01.314 L20 []#2341 creating Encoder <Encoder for unknown storage
> index>
> 01:04:01.314 L20 []#2342 file size: 4669394
> 01:04:01.314 L10 []#2343 my encoding parameters: (4, 6, 6, 131072)
> 01:04:01.314 L20 []#2344 got encoding parameters: 4/6/6 131072
> 01:04:01.314 L20 []#2345 now setting up codec
> 01:04:01.348 L20 []#2346 using storage index k5ga2
> 01:04:01.348 L20 []#2347 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
> starting
> 01:04:01.363 L10 []#2348 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
> response to allocate_buckets() from server zp6jpfeu: alreadygot=(0,),
> allocated=()
> 01:04:01.372 L10 []#2349 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
> response to allocate_buckets() from server pa2myijh: alreadygot=(2,),
> allocated=(1,)
> 01:04:01.375 L20 []#2350 storage: allocate_buckets
> k5ga2suaoz7gju523f5ni3mswe
> 01:04:01.377 L10 []#2351 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
> response to allocate_buckets() from server omkzwfx5: alreadygot=(3,),
> allocated=()
> 01:04:01.381 L10 []#2352 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
> response to allocate_buckets() from server wo6akhxt: alreadygot=(4,),
> allocated=()
> 01:04:01.404 L10 []#2353 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
> response to allocate_buckets() from server ughwvrtu: alreadygot=(),
> allocated=(5,)
> 01:04:01.405 L25 []#2354 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
> server selection unsuccessful for <Tahoe2ServerSelector for upload
> k5ga2>: shares could be placed on only 5 server(s) such that any 4 of
> them have enough shares to recover the file, but we were asked to place
> shares on at least 6 such servers. (placed all 6 shares, want to place
> shares on at least 6 servers such that any 4 of them have enough shares
> to recover the file, sent 5 queries to 5 servers, 5 queries placed some
> shares, 0 placed none (of which 0 placed none due to the server being
> full and 0 placed none due to an error)), merged=sh0: zp6jpfeu, sh1:
> pa2myijh, sh2: pa2myijh, sh3: omkzwfx5, sh4: wo6akhxt, sh5: ughwvrtu
> 01:04:01.407 L20 []#2355 web: 127.0.0.1 PUT /uri/[CENSORED].. 500 1644
> }}}
New description:
The error happened with 1.9.1 too. I just upgraded to 1.9.2 and fixed some
files/dir that 1.9.1 couldn't repair reliably hoping the following problem
would get away too (it didn't).
There are some peculiarities in my setup: I use USB disks connected to a
single server so all storage nodes are running on the same server although
physically on a disk that can easily be sent away for increasing the
durability of the whole storage. At the time of failure there were 7 such
storage nodes in my setup and my whole store was fully repaired on these 7
nodes, all the content is/was uploaded with
shares.needed = 4
shares.happy = 6
shares.total = 6
Although 7 >= 6 I get this error when trying to tahoe cp a new file:
{{{
Traceback (most recent call last):
File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 677,
in _done
self.request.complete(res)
File \"/usr/lib64/python2.7/site-packages/foolscap/call.py\", line 60,
in complete
self.deferred.callback(res)
File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
line 361, in callback
self._startRunCallbacks(result)
File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
line 455, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File \"/usr/lib64/python2.7/site-packages/twisted/internet/defer.py\",
line 542, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File \"/usr/lib64/python2.7/site-
packages/allmydata/immutable/upload.py\", line 553, in _got_response
return self._loop()
File \"/usr/lib64/python2.7/site-
packages/allmydata/immutable/upload.py\", line 404, in _loop
return self._failed(\"%s (%s)\" % (failmsg,
self._get_progress_message()))
File \"/usr/lib64/python2.7/site-
packages/allmydata/immutable/upload.py\", line 566, in _failed
raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: shares could be placed on
only 5 server(s) such that any 4 of them have enough shares to recover the
file, but we were asked to place shares on at least 6 such servers.
(placed all 6 shares, want to place shares on at least 6 servers such that
any 4 of them have enough shares to recover the file, sent 5 queries to 5
servers, 5 queries placed some shares, 0 placed none (of which 0 placed
none due to the server being full and 0 placed none due to an error))
}}}
I recently found out about flogtool, so I run it on the client node (which
is one of the 7 storage nodes btw), I only pasted the last part from
CHKUploader (I can attach the whole log if needs be):
{{{
01:04:01.314 L20 []#2339 CHKUploader starting
01:04:01.314 L20 []#2340 starting upload of
<allmydata.immutable.upload.EncryptAnUploadable instance at 0x2c9b5a8>
01:04:01.314 L20 []#2341 creating Encoder <Encoder for unknown storage
index>
01:04:01.314 L20 []#2342 file size: 4669394
01:04:01.314 L10 []#2343 my encoding parameters: (4, 6, 6, 131072)
01:04:01.314 L20 []#2344 got encoding parameters: 4/6/6 131072
01:04:01.314 L20 []#2345 now setting up codec
01:04:01.348 L20 []#2346 using storage index k5ga2
01:04:01.348 L20 []#2347 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
starting
01:04:01.363 L10 []#2348 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
response to allocate_buckets() from server zp6jpfeu: alreadygot=(0,),
allocated=()
01:04:01.372 L10 []#2349 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
response to allocate_buckets() from server pa2myijh: alreadygot=(2,),
allocated=(1,)
01:04:01.375 L20 []#2350 storage: allocate_buckets
k5ga2suaoz7gju523f5ni3mswe
01:04:01.377 L10 []#2351 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
response to allocate_buckets() from server omkzwfx5: alreadygot=(3,),
allocated=()
01:04:01.381 L10 []#2352 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
response to allocate_buckets() from server wo6akhxt: alreadygot=(4,),
allocated=()
01:04:01.404 L10 []#2353 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
response to allocate_buckets() from server ughwvrtu: alreadygot=(),
allocated=(5,)
01:04:01.405 L25 []#2354 <Tahoe2ServerSelector for upload k5ga2>(k5ga2):
server selection unsuccessful for <Tahoe2ServerSelector for upload k5ga2>:
shares could be placed on only 5 server(s) such that any 4 of them have
enough shares to recover the file, but we were asked to place shares on at
least 6 such servers. (placed all 6 shares, want to place shares on at
least 6 servers such that any 4 of them have enough shares to recover the
file, sent 5 queries to 5 servers, 5 queries placed some shares, 0 placed
none (of which 0 placed none due to the server being full and 0 placed
none due to an error)), merged=sh0: zp6jpfeu, sh1: pa2myijh, sh2:
pa2myijh, sh3: omkzwfx5, sh4: wo6akhxt, sh5: ughwvrtu
01:04:01.407 L20 []#2355 web: 127.0.0.1 PUT /uri/[CENSORED].. 500 1644
}}}
--
Comment (by daira):
Same bug as #2016?
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1791#comment:16>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list