Opened at 2013-02-19T05:08:18Z
Last modified at 2020-10-30T12:35:44Z
#1920 closed defect
cloud backend: a failed upload can leave chunk objects that prevent subsequent uploads of the same file — at Version 4
Reported by: | davidsarah | Owned by: | daira |
---|---|---|---|
Priority: | normal | Milestone: | 1.15.0 |
Component: | code-storage | Version: | 1.9.2 |
Keywords: | cloud-backend openstack upload reliability | Cc: | |
Launchpad Bug: |
Description (last modified by daira)
When attempting to upload a ~10 MiB file using the OpenStack cloud backend (to a Rackspace account), the upload failed due to a DNS error:
<class 'allmydata.interfaces.UploadUnhappinessError'>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file.: [Failure instance: Traceback (failure with no frames): <class 'allmydata.util.pipeline.PipelineError'>: <PipelineError error=([Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: twisted.internet.error.DNSLookupError: DNS lookup failed: address 'storage101.dfw1.clouddrive.com' not found: [Errno -2] Name or service not known. ]'> ])> ]
That error is not itself the subject of this ticket. The issue for this ticket is that subsequent uploads of the same immutable file also failed, even after the DNS error had resolved itself:
<class 'allmydata.interfaces.UploadUnhappinessError'>: server selection failed for <Tahoe2ServerSelector for upload sefes>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being full and 1 placed none due to an error)) (last failure (from <ServerTracker for server kdu2jtww and SI sefes>) was: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): [...] File "/home/davidsarah/tahoe/git/working/src/allmydata/storage/backends/cloud/cloud_common.py", line 339, in _retry d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/davidsarah/tahoe/git/working/src/allmydata/storage/backends/cloud/openstack/openstack_container.py", line 81, in _got_response message="unexpected response code %r %s" % (response.code, response.phrase)) allmydata.storage.backends.cloud.cloud_common.CloudError: ("try 1 failed: GET object ('shares/se/sefeslgzc4su3i66b72aytmebm/0',) {}", 404, 'unexpected response code 404 Not Found', None) ]'> ])
On looking at the container contents via the Rackspace Cloud Files WUI, there is only one chunk object stored for this file, with key:
shares/se/sefeslgzc4su3i66b72aytmebm/0.5
(i.e. the 6th chunk).
Change History (4)
comment:1 Changed at 2013-02-19T05:11:37Z by davidsarah
comment:2 Changed at 2013-02-19T05:12:58Z by davidsarah
- Description modified (diff)
- Status changed from new to assigned
comment:3 Changed at 2013-02-20T18:36:26Z by zooko
I think this is one kind of failure that would be prevented by Two-Phase Commit (#1755).
comment:4 Changed at 2013-07-22T20:49:37Z by daira
- Description modified (diff)
- Milestone changed from undecided to 1.12.0
- Owner changed from davidsarah to daira
- Status changed from assigned to new
I suspect (but I'm not sure) that this is a generic cloud backend issue and could also happen in principle for S3. It may be less likely, especially since we reduced the frequency of S3 errors by retrying.