[tahoe-lafs-trac-stream] [tahoe-lafs] #1791: UploadUnhappinessError with available storage nodes > shares.happy

Sat Jul 7 21:30:23 UTC 2012

#1791: UploadUnhappinessError with available storage nodes > shares.happy
------------------------------------+------------------------------------
     Reporter:  gyver               |      Owner:  gyver
         Type:  defect              |     Status:  new
     Priority:  major               |  Milestone:  1.10.0
    Component:  code-peerselection  |    Version:  1.9.2
   Resolution:                      |   Keywords:  happiness upload error
Launchpad Bug:                      |
------------------------------------+------------------------------------

Comment (by gyver):

 Replying to [comment:6 davidsarah]:
 > Please add the following just after line 225 (i.e. after
 {{{readonly_servers = }}}... and before {{{# decide upon the
 renewal/cancel secrets}}}...) of
 [source:1.9.2/src/allmydata/immutable/upload.py
 src/allmydata/immutable/upload.py in 1.9.2]:

 I may not have done it right : I got the same output with this at the end:
 {{{
 23:09:02.238 L23 []#2436 an outbound callRemote (that we [omkz] sent to
 someone else [zqxq]) failed on the far end
 23:09:02.238 L10 []#2437  reqID=873, rref=<RemoteReference at 0x2e780d0>,
 methname=RILogObserver.foolscap.lothar.com.msg
 23:09:02.238 L10 []#2438  the REMOTE failure was:
  FAILURE:
  [CopiedFailure instance: Traceback from remote host -- Traceback (most
 recent call last):
    File "/usr/lib64/python2.7/site-packages/foolscap/slicers/root.py",
 line 107, in send
      d.callback(None)
    File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py",
 line 361, in callback
      self._startRunCallbacks(result)
    File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py",
 line 455, in _startRunCallbacks
      self._runCallbacks()
    File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py",
 line 542, in _runCallbacks
      current.result = callback(current.result, *args, **kw)
  --- <exception caught here> ---
    File "/usr/lib64/python2.7/site-packages/foolscap/banana.py", line 215,
 in produce
      slicer = self.newSlicerFor(obj)
    File "/usr/lib64/python2.7/site-packages/foolscap/banana.py", line 314,
 in newSlicerFor
      return topSlicer.slicerForObject(obj)
    File "/usr/lib64/python2.7/site-packages/foolscap/slicer.py", line 48,
 }}}

 BUT... I may have a lead looking at the last error message in my original
 log dump.

 server selection unsuccessful for <Tahoe2ServerSelector for upload k5ga2>:
 shares could be placed on only 5 server(s) [...], merged=sh0: zp6jpfeu,
 sh1: pa2myijh, sh2: pa2myijh, sh3: omkzwfx5, sh4: wo6akhxt, sh5: ughwvrtu

 I assume the sh<n> are the shares to be placed. sh1 and sh2 were affected
 to pa2myijh. I'm not sure if this repartition is the result of share
 detection (my guess) or the result of a share placement algorithm that
 could produce invalid placement and needs a check before upload (late
 error detection isn't good practice so I bet it's not the case).

 What if these shares are already stored on pa2myijh '''before''' the
 upload attempt (due to past uploads with a buggy version or whatever
 happened in the store directory out of Tahoe's control). Is the code able
 to detect such a case and reupload one of the two shares on a free
 (without one of the 6 shares) server? If not, it might be the cause of my
 problem (the file was part of a long list of files I tried to upload with
 only partial success weeks ago...) and my storage nodes are most probably
 polluted by "dangling" shares.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1791#comment:9>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage