#2116 closed defect (fixed)

xml parse error from S4 service

Reported by: zooko Owned by: daira
Priority: major Milestone: undecided
Component: code-storage Version: 1.9.0-s3branch
Keywords: s3 error txaws retry LeastAuthority Cc:
Launchpad Bug:

Description (last modified by daira)

zooko@spark ~/research/papers $ tahoe backup . $WRITABLE_DIRCAP:
Traceback (most recent call last):
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 325, in backup
    return bu.run()
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 194, in process
    childcap, metadata = self.upload(childpath)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 305, in upload
    raise HTTPError("Error during file PUT", resp)
HTTPError: Error during file PUT: 500 Internal Server Error
"Traceback (most recent call last):
  File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 753, in receiveClose
    self.request.fail(f)
  File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 95, in fail
    self.deferred.errback(why)
  File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 421, in errback
    self._startRunCallbacks(fail)
  File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 488, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 604, in _got_response
    return self._loop()
  File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 516, in _loop
    return self._failed(msg)
  File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 617, in _failed
    raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: server selection failed for 
<Tahoe2ServerSelector for upload qd7vq>: shares could be placed or found on 
only 0 server(s). We were asked to place shares on at least 1 server(s) such 
that any 1 of them have enough shares to recover the file. (placed 0 shares out 
of 1 total (1 homeless), want to place shares on at least 1 servers such that 
any 1 of them have enough shares to recover the file, sent 1 queries to 1 
servers, 0 queries placed some shares, 1 placed none (of which 0 placed none 
due to the server being full and 1 placed none due to an error)) (last failure 
(from <ServerTracker for server 6dwd2q and SI qd7vq>) was:
[Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>:
<RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 287, in connectionLost
    protocol.connectionLost(reason)
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 193, in connectionLost
    self.factory._disconnectedDeferred.callback(None)
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 368, in callback
    self._startRunCallbacks(result)
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 464, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> --- 
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/s3/client.py\", line 153, in _parse_get_bucket
    root = XML(xml_bytes)
  File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/util.py\", line 61, in XML
    return parser.close()
  File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1655, in close
    self._raiseerror(v)
  File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
]'>
])
"

Change History (29)

comment:1 Changed at 2013-11-26T04:34:59Z by zooko

I tried the same command-line again, and this time it worked. I had started by tahoe-lafs gateway a few second before I got this xml parse error, so I suspect that this is a case of #719.

comment:2 Changed at 2013-11-26T17:00:40Z by daira

I don't think it is a case of #719: the S3 container code in the server got as far as making an S3 request and trying to parse the result, which was not valid XML. #719 refers to the case where the client can't reach enough servers.

Last edited at 2013-11-26T19:18:17Z by daira (previous) (diff)

comment:3 Changed at 2013-11-26T17:01:10Z by daira

Actually the response from S3 seems to have been empty.

comment:4 follow-up: Changed at 2013-11-26T17:08:41Z by daira

  • Keywords s3 error txaws retry LeastAuthority added
  • Owner set to daira
  • Status changed from new to assigned
  • Version changed from 1.10.0 to 1.9.0-s3branch

Hmm, maybe an empty S3 response should trigger a retry?

comment:5 Changed at 2013-11-26T17:13:13Z by daira

Sigh, I wish we could reduce that error report to something more concise -- although I suppose we should be grateful that the actual error was in there somewhere.

Last edited at 2013-11-26T17:13:26Z by daira (previous) (diff)

comment:6 in reply to: ↑ 4 Changed at 2013-11-27T19:59:27Z by zooko

Replying to daira:

Hmm, maybe an empty S3 response should trigger a retry?

I guess. :-/

This is what I think of as "snowshoes". Having our upper layer perform some redundancy or retrying just to work-around what appear to be bugs or unreliability from the lower layer. Ick.

comment:7 follow-up: Changed at 2013-11-27T20:01:57Z by daira

Well, I'm philosophically resistant to that as well, but I have no confidence that we can get the underlying problem with S3 fixed.

comment:8 in reply to: ↑ 7 Changed at 2013-11-27T20:03:38Z by zooko

Replying to daira:

Well, I'm philosophically resistant to that as well, but I have no confidence that we can get the underlying problem with S3 fixed.

Yeah, I agree that we should go ahead and add a limited number of exponential-backoff-retries for this.

comment:9 Changed at 2013-11-27T20:04:42Z by daira

Note that this isn't the first time I've seen an empty response from S3. They are rare, but they've come up before - we just haven't recorded that problem until now.

comment:10 Changed at 2013-11-28T12:52:35Z by zooko

I just a similar failure again:

Traceback (most recent call last):
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 325, in backup
    return bu.run()
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 194, in process
    childcap, metadata = self.upload(childpath)
  File "/home/zooko/playground/LAFS/pristine/src/allmydata/scripts/tahoe_backup.py", line 305, in upload
    raise HTTPError("Error during file PUT", resp)
HTTPError: Error during file PUT: 500 Internal Server Error
"Traceback (most recent call last):
  File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 753, in receiveClose
    self.request.fail(f)
  File \"/home/zooko/playground/LAFS/pristine/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/call.py\", line 95, in fail
    self.deferred.errback(why)
  File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 421, in errback
    self._startRunCallbacks(fail)
  File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 488, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/home/zooko/.local/lib/python2.7/site-packages/twisted/internet/defer.py\", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 604, in _got_response
    return self._loop()
  File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 516, in _loop
    return self._failed(msg)
  File \"/home/zooko/playground/LAFS/pristine/src/allmydata/immutable/upload.py\", line 617, in _failed
    raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector 
for upload s7uqw>: shares could be placed or found on only 0 server(s). We were asked to place 
shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file. 
(placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such 
that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 
queries placed some shares, 1 placed none (of which 0 placed none due to the server being full 
and 1 placed none due to an error)) (last failure (from <ServerTracker for server 6dwd2q and SI 
s7uqw>) was:
[Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 287, in connectionLost
    protocol.connectionLost(reason)
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 193, in connectionLost
    self.factory._disconnectedDeferred.callback(None)
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 368, in callback
    self._startRunCallbacks(result)
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 464, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/home/customer/LAFS_source/support/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/s3/client.py\", line 153, in _parse_get_bucket
    root = XML(xml_bytes)
  File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/util.py\", line 61, in XML
    return parser.close()
  File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1655, in close
    self._raiseerror(v)
  File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
]'>
])
"
Last edited at 2014-03-17T22:00:44Z by daira (previous) (diff)

comment:11 Changed at 2013-11-29T05:42:21Z by zooko

Hm, I seem to get this every time that I run a full tahoe backup on my data now. ☹ It seems like the LeastAuthority S4 service is suffering badly from this.

  File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/s3/client.py\", line 153, in _parse_get_bucket
    root = XML(xml_bytes)
  File \"/usr/local/lib/python2.7/dist-packages/txAWS-0.2.1.post4-py2.7.egg/txaws/util.py\", line 61, in XML
    return parser.close()
  File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1655, in close
    self._raiseerror(v)
  File \"/usr/lib/python2.7/xml/etree/ElementTree.py\", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
]'>
])
"

comment:12 Changed at 2013-11-30T20:55:30Z by daira

  • Priority changed from normal to major

comment:13 Changed at 2013-12-01T16:21:22Z by zooko

I just got this same error again, but this time on read instead of on upload. Manually retrying a couple of times (by hitting C-r in my web browser) got it to work.

comment:15 Changed at 2014-03-04T19:30:43Z by daira

I'm going to make it retry.

comment:16 follow-up: Changed at 2014-03-10T21:45:28Z by daira

Note that the error occurs in txAWS when it tries to parse an empty response body for a 500 error as XML. The fix might need to be in txAWS as well.

It's a bit difficult to test this, because the tests intentionally do not depend on txAWS (currently they only test the mock cloud backend, not the service container implementations).

comment:17 in reply to: ↑ 16 Changed at 2014-03-11T15:55:09Z by daira

Replying to daira:

Note that the error occurs in txAWS when it tries to parse an empty response body for a 500 error as XML.

No, this is a wrong inference. The error occurs in txaws.s3.client._parse_get_bucket, which should not be reached if the S3 server responded with a 500 error. (The 500 error in the stack trace is from the Tahoe-LAFS gateway, not S3.) So it looks like S3 returned success with an empty body.

comment:18 Changed at 2014-03-17T22:03:07Z by daira

  • Description modified (diff)

comment:19 Changed at 2014-03-17T22:23:31Z by daira

Pasting the definitions from txaws.util into a Python interactive session:

$ python
Python 2.7.5+ (default, Jun  2 2013, 13:26:34) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> from xml.etree.ElementTree import XMLTreeBuilder
>>> def XML(text):
...     parser = NamespaceFixXmlTreeBuilder()
...     parser.feed(text)
...     return parser.close()
... 
>>> class NamespaceFixXmlTreeBuilder(XMLTreeBuilder):
...     def _fixname(self, key):
...         if "}" in key:
...             key = key.split("}", 1)[1]
...         return key
... 
>>> XML("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in XML
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1654, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
Last edited at 2014-03-17T22:26:38Z by daira (previous) (diff)

comment:20 Changed at 2014-03-17T22:25:24Z by daira

It looks like only an empty string will give the xml.etree.ElementTree.ParseError: no element found: line 1, column 0 error.

comment:21 Changed at 2014-03-18T17:00:13Z by CyberAxe

Got this last night, did Za do the update on my backend? If so it didn't work.

:: ---------------------  START of LOG  ------------ 
:: START Date: 20141803	Time:  24638 
:: Backup-Now assuming tahoe: as alias and BackedUp as Target Location in Grid 
:: c:\Python27\ must be in system PATH: see tahoe-lafs Windows wiki for help 
:: Will Backup everything in folder c:\BackMeUp\ 
:: v0040  
:: ---------------------  Start Backup  ------------ 
:: ---------------------  Errors Backup  ----------- 
Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 194, in process
    childcap, metadata = self.upload(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 305, in upload
    raise HTTPError("Error during file PUT", resp)
HTTPError: Error during file PUT: 500 Internal Server Error
"Traceback (most recent call last):
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\util\\pipeline.py\", line 65, in _ebDeferred
    self.checkForFinished()
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\util\\pipeline.py\", line 53, in checkForFinished
    self.errback(self.failure)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 422, in errback
    self._startRunCallbacks(fail)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 489, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 576, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\encode.py\", line 501, in _remove_shareholder
    raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file.: [Failure instance: Traceback (failure with no frames): <class 'allmydata.util.pipeline.PipelineError'>: <PipelineError error=([Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
]'>
])>
]
"
:: ---------------------  End Backup  -------------- 
Last edited at 2014-03-18T17:07:30Z by zooko (previous) (diff)

comment:22 follow-up: Changed at 2014-03-21T18:21:35Z by CyberAxe

:: --------------------- START of LOG ------------ :: START Date: 20142103 Time: 21327 :: Backup-Now assuming tahoe: as alias and BackedUp? as Target Location in Grid :: c:\Python27\ must be in system PATH: see tahoe-lafs Windows wiki for help :: Will Backup everything in folder c:\BackMeUp?\ :: v0040 :: --------------------- Start Backup ------------ :: --------------------- Errors Backup -----------

Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 194, in process
    childcap, metadata = self.upload(childpath)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 305, in upload
    raise HTTPError("Error during file PUT", resp)
HTTPError: Error during file PUT: 500 Internal Server Error
"Traceback (most recent call last):
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 753, in receiveClose
    self.request.fail(f)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 95, in fail
    self.deferred.errback(why)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 422, in errback
    self._startRunCallbacks(fail)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 489, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 576, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 604, in _got_response
    return self._loop()
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 516, in _loop
    return self._failed(msg)
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 617, in _failed
    raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector for upload qtquq>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being full and 1 placed none due to an error)) (last failure (from <ServerTracker for server xfia4afr and SI qtquq>) was: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 299, in connectionLost
    protocol.connectionLost(reason)
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 204, in connectionLost
    self.factory._disconnectedDeferred.callback(None)
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback
    self._startRunCallbacks(result)
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 490, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 370, in _retry
    d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs)
  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 417, in _handle_error
    d = task.deferLater(self._reactor, BACKOFF_SECONDS_BEFORE_RETRY[trynum-1], operation, *args, **kwargs)
exceptions.AttributeError: S3Container instance has no attribute '_reactor'
]'>
])
"}}}
:: ---------------------  End Backup  -------------- 
:: ---------------------  Start View Latest  ------- 
:: ---------------------  End View Latest  --------- 
:: ---------------------  Start Deep Check   ------- 
:: SET TO RUN  
:: ---------------------  End Deep Check   --------- 
:: ---------------------  Start Grid Stats  -------- 
:: SET TO RUN  
:: ---------------------  End Grid Stats   --------- 
:: ---------------------  Start Log Backup   ------- 
:: Copy local log files to future backups to GRID 
:: ---------------------  END Log Backup     ------- 
:: END Date: Fri 03/21/2014	Time:  2:49:27.69 
:: ---------------------  END of LOG  -------------- 
Last edited at 2014-03-21T18:44:17Z by CyberAxe (previous) (diff)

comment:23 follow-up: Changed at 2014-03-21T21:13:23Z by CyberAxe

(3:10:26 PM) CyberAxe?: Seems like instead of the 500 error now I'm getting it just stuck over and over, maybe I'm not waitting long enought for the 500 error to generate but I've waited hours. http://sebsauvage.net/paste/?f38e2b33f87e1422#A8iAnduHoydJrE6XejuwYX6CHoRx/iWBxUb+OLx1mvM= I have to terminate the process here's what it says when I do that. (3:12:20 PM) CyberAxe?: the last line it's stuck on in this example is a create folder action, when I restart backup it does use that folder, so it is creating it re-using old directory for 'c:\BackMeUp?\WesternColorado_files\images\WaspWar3_files\9' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\WesternColorado_files\images\WaspWar3_files' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\WesternColorado_files\images' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\WesternColorado_files' (3:12:20 PM) CyberAxe?: re-using old directory for 'c:\BackMeUp?\' <----- where it got stuck.

comment:24 Changed at 2014-03-22T02:23:07Z by CyberAxe

Here's another trackback error I got after the backend update to fix 500 error

 File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 422, in errback
    self._startRunCallbacks(fail)
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 489, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 576, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\encode.py\", line 501, in _remove_shareholder
    raise UploadUnhappinessError(msg)
allmydata.interfaces.UploadUnhappinessError: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file.: [Failure instance: Traceback (failure with no frames): <class 'allmydata.util.pipeline.PipelineError'>: <PipelineError error=([Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/tcp.py\", line 299, in connectionLost
    protocol.connectionLost(reason)
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/web/client.py\", line 204, in connectionLost
    self.factory._disconnectedDeferred.callback(None)
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback
    self._startRunCallbacks(result)
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 490, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 370, in _retry
    d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs)
  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 417, in _handle_error
    d = task.deferLater(self._reactor, BACKOFF_SECONDS_BEFORE_RETRY[trynum-1], operation, *args, **kwargs)
exceptions.AttributeError: S3Container instance has no attribute '_reactor'
]'>
])> 

comment:25 Changed at 2014-03-22T02:27:43Z by CyberAxe

Here's another error after the last one

:: ---------------------  START of LOG  ------------ 
:: START Date: 20142103	Time: 174729 
:: Backup-Now assuming tahoe: as alias and BackedUp as Target Location in Grid 
:: c:\Python27\ must be in system PATH: see tahoe-lafs Windows wiki for help 
:: Will Backup everything in folder c:\BackMeUp\ 
:: v0040  
:: ---------------------  Start Backup  ------------ 
:: ---------------------  Errors Backup  ----------- 
Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run
    put_child(archives_url, now, new_backup_dircap)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child
    raise HTTPError("Error during put_child", resp)
HTTPError: Error during put_child: 500 Internal Server Error
"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback (failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):\\n  File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n    cb(*args, **kwargs)\\n  File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n    op(res)\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n    self._startRunCallbacks(result)\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 490, in _startRunCallbacks\\n    self._runCallbacks()\\n--- <exception caught here> ---\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n    current.result = callback(current.result, *args, **kw)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n    account.add_or_renew_default_lease(self.storage_index, shnum)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in add_or_renew_default_lease\\n    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n    renewal_time, expiration_time)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n    raise NonExistentShareError(si_s, shnum)\\nallmydata.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None)

comment:26 in reply to: ↑ 23 Changed at 2014-03-22T02:39:33Z by CyberAxe

I found that my script wasn't showing me it had moved on to a deep-check and was done.

Replying to CyberAxe:

(3:10:26 PM) CyberAxe: Seems like instead of the 500 error now I'm getting it just stuck over and over, maybe I'm not waitting long enought for the 500 error to generate but I've waited hours. http://sebsauvage.net/paste/?f38e2b33f87e1422#A8iAnduHoydJrE6XejuwYX6CHoRx/iWBxUb+OLx1mvM= I have to terminate the process here's what it says when I do that. (3:12:20 PM) CyberAxe: the last line it's stuck on in this example is a create folder action, when I restart backup it does use that folder, so it is creating it re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\WesternColorado_files\images' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\WesternColorado_files' (3:12:20 PM) CyberAxe: re-using old directory for 'c:\BackMeUp\' <----- where it got stuck.

Last edited at 2014-03-22T17:32:30Z by zooko (previous) (diff)

comment:27 Changed at 2014-03-22T17:27:39Z by CyberAxe

Here's what I'm getting now:

processing 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_0.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_1.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_0.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'..
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files'
 creating directory for 'c:\BackMeUp\'
Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run
    put_child(archives_url, now, new_backup_dircap)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child
    raise HTTPError("Error during put_child", resp)
HTTPError: Error during put_child: 500 Internal Server Error
"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback (
failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r
emote host -- Traceback (most recent call last):\\n  File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n
 cb(*args, **kwargs)\\n  File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n    op(res)\\n  Fil
e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n    self
._startRunCallbacks(result)\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\
", line 490, in _startRunCallbacks\\n    self._runCallbacks()\\n--- <exception caught here> ---\\n  File \"/usr/local/lib/python2.7/dist-pac
kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n    current.result = callback(current.re
sult, *args, **kw)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n    account.
add_or_renew_default_lease(self.storage_index, shnum)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in
add_or_renew_default_lease\\n    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n  File \"/home/custom
er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n    renewal_time, expiration_time)\\n  File \"/home/cust
omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n    raise NonExistentShareError(si_s, shnum)\\nallmy
data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None)

"

comment:28 in reply to: ↑ 22 Changed at 2014-03-25T03:10:49Z by daira

Replying to CyberAxe:

>   File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 370, in _retry
>     d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs)
>   File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_common.py\", line 417, in _handle_error
>     d = task.deferLater(self._reactor, BACKOFF_SECONDS_BEFORE_RETRY[trynum-1], operation, *args, **kwargs)
> exceptions.AttributeError: S3Container instance has no attribute '_reactor'

This is a separate bug in the S3 retry logic. Filed as #2206.

Last edited at 2014-03-25T03:11:18Z by daira (previous) (diff)

comment:29 Changed at 2014-04-15T00:49:57Z by daira

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.